Enhancing open data with identifiers

2014-10-31 Thread Leigh Dodds
I thought I'd share a link to this UKODI/Thomson Reuters white paper
which was published today:

http://theodi.org/guides/data-identifiers-white-paper

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Dbpedia is down?

2014-10-11 Thread Leigh Dodds
Hi,

Dbpedia has been down for maintenance since yesterday evening, does
anyone know when it will be back up:

All resource URIs return:

The web-site you are currently trying to access is under maintenance
at this time. We are sorry for any inconvenience this has caused.

I'd have reported this to the bug tracker listed on the dbpedia
support page, but that link is also broken:

http://sourceforge.net/tracker/?group_id=190976

Is there a location where planned maintenance is noted? Similarly, is
there somewhere to go to check service status and updates on fault
finding?

Thanks,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: URIs within URIs

2014-08-28 Thread Leigh Dodds
Hi,

I documented all the variations of this form of URI construction I was
aware of in the Rebased URI pattern:

http://patterns.dataincubator.org/book/rebased-uri.html

This covers generating one URI from another. What that new URI returns
is a separate concern.

Cheers,

L.

On Fri, Aug 22, 2014 at 4:56 PM, Bill Roberts b...@swirrl.com wrote:
 Hi Luca

 We certainly find a need for that kind of feature (as do many other linked 
 data publishers) and our choice in our PublishMyData platform has been the 
 URL pattern {domain}/resource?uri={url-encoded external URI} to expose info 
 in our databases about URIs in other domains.

 If there was a standard URL route for this scenario, we'd be glad to 
 implement it

 Best regards

 Bill

 On 22 Aug 2014, at 16:44, Luca Matteis lmatt...@gmail.com wrote:

 Dear LOD community,

 I'm wondering whether there has been any research regarding the idea
 of having URIs contain an actual URI, that would then resolve
 information about what the linked dataset states about the input URI.

 Example:

 http://foo.com/alice - returns data about what foo.com has regarding alice

 http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice - doesn't
 just resolve the alice URI above, but returns what bar.com wants to
 say about the alice URI

 For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could 
 return:

 http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice a void:Dataset .
 http://foo.com/alice #some #data .

 I know SPARQL endpoints already have this functionality, but was
 wondering whether any formal research was done towards this direction
 rather than a full-blown SPARQL endpoint.

 The reason I'm looking for this sort of thing is because I simply need
 to ask certain third-party datasets whether they have data about a URI
 (inbound links).

 Best,
 Luca






-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



ORCID as Linked Data

2014-06-17 Thread Leigh Dodds
I discovered this today:

curl -v -L -H Accept: text/turtle http://orcid.org/-0003-0837-2362

A fairly new addition to the ORCID service I think.

With many DOIs already supporting Linked Data views, this makes a nice
addition to the academic linked data landscape.

Still lots of room for improvement, but definitely a step forwards.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: rdf:HTML datatype in RDF 1.1

2014-04-02 Thread Leigh Dodds
The value space is defined as being a DocumentFragment. I'm not clear
on whether DOM4 has changed the meaning of that, but a fragment is a
collection of nodes, which don't necessarily have a common root
element.

So I think either is valid.

L.

On Wed, Apr 2, 2014 at 11:54 AM, john.walker john.wal...@semaku.com wrote:
 Simple question on this which wasn't immediately obvious from the
 recommendation [1].

 Is it expected that the string has a single top-level element:

 pHello world!/p

 Or is it OK to include fragments like:

 Hello world!
 bHello/b world!
 Hello iworld/i!
 bHello/b iworld/i!

 Regards,

 John

 [1] http://www.w3.org/TR/rdf11-concepts/#section-html



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Exchanging Links with LINK and UNLINK

2013-12-11 Thread Leigh Dodds
Hi,

The HTTP Link and Unlink Methods RFC [1] specifies how to use the
LINK/UNLINK HTTP methods to support exchanging links between resources
on the web.

To explore these ideas I've created a Ruby implementation based on
Rack middleware.
This means that it can be easily integrated into any ruby based web
framework [2].

There are a couple of link stores provides, including one based on a SPARQL 1.1
compliant endpoint.

Supplemented with suitable authentication I think this provides an
interesting way to
exchange links between Linked Data publishers. No special mechanism is
needed, just
existing protocols. Its nicely aligned with existing web infrastructure.

I thought I'd share this with the community as I don't feel we've
settled on a common
pattern for exchanging this kind of information between publishers.

Cheers,

L.

[1]. http://tools.ietf.org/html/draft-snell-link-method-08
[2]. https://github.com/ldodds/link-middleware

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: How to publish SPARQL endpoint limits/metadata?

2013-10-08 Thread Leigh Dodds
Hi,

As others have suggested, extending service descriptions would be the
best way to do this. This might make a nice little community project.

It would be useful to itemise a list of the type of limits that might
be faced, then look at how best to model them.

Perhaps something we could do on the list?

Cheers,

L.



On Tue, Oct 8, 2013 at 10:46 AM, Frans Knibbe | Geodan
frans.kni...@geodan.nl wrote:
 Hello,

 I am experimenting with running SPARQL endpoints and I notice the need to
 impose some limits to prevent overloading/abuse. The easiest and I believe
 fairly common way to do that is to LIMIT the number of results that the
 endpoint will return for a single query.

 I now wonder how I can publish the fact that my SPARQL endpoint has a LIMIT
 and that is has a certain value.

 I have read the thread Public SPARQL endpoints:managing (mis)-use and
 communicating limits to users, but that seemed to be about how to
 communicate limits during querying. I would like to know if there is a way
 to communicate limits before querying is started.

 It seems to me that a logical place to publish a limit would be in the
 metadata of the SPARQL endpoint. Those metadata could contain all limits
 imposed on the endpoint, and perhaps other things like a SLA or a
 maintenance schedule... data that could help in the proper use of the
 endpoint by both software agents and human users.

 So perhaps my enquiry really is about a standard for publishing SPARQL
 endpoint metadata, and how to access them.

 Greetings,
 Frans


 --
 Geodan
 President Kennedylaan 1
 1079 MB Amsterdam (NL)

 T +31 (0)20 - 5711 347
 E frans.kni...@geodan.nl
 www.geodan.nl | disclaimer
 --



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-10-04 Thread Leigh Dodds
Hi Hugh,

Hasn't dbpedia always suffered from this? I've tended to do the same
as you and have encountered similar inconsistencies. I've never really
figured out whether its down to inconsistency encoding in the data
conversion or something else.

Cheers,

L.


On Fri, Oct 4, 2013 at 1:42 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 Hi.
 Chris has suggested I send the following to the LOD list, as it may be of 
 interest to several people:

 Hi Chris.
 Great stuff!

 I have a question.
 Or would you prefer I put it on the LOD list for discussion?

 It is about url encoding.

 Dbpedia:
 http://dbpedia.org/page/Ashford_%28borough%29 is not found
 http://dbpedia.org/page/Ashford_(borough) works, and redirects to
 http://dbpedia.org/resource/Borough_of_Ashford
 Wikipedia:
 http://en.wikipedia.org/wiki/Ashford_%28borough%29 works
 http://en.wikipedia.org/wiki/Ashford_(borough) works
 Both go to the page with content of 
 http://en.wikipedia.org/wiki/Borough_of_Ashford although the URL in the 
 address bar doesn't change.

 So the problem:
 I usually find things in wikipedia, and then use the last bit to construct 
 the dbpedia URI - I suspect lots of people do this.
 But as you can see, the url encoded URI, which can often be found in the 
 wild, won't allow me to do this.
 There are of course many wikipedia URLs with ( and ) in them - (artist), 
 (programmer), (borough) etc.
 It is also the same with comma and single quote.

 I think this may be different from 3.8, but can't be sure - is it intended?

 Very best
 Hugh



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Minimizing data volume

2013-09-09 Thread Leigh Dodds
Hi,

Before using compression you might also make a decision about whether
you need to represent all of this information as RDF in the first
place.

For example, rather than include the large geometries as literals, why
not store them as separate documents and let clients fetch the
geometries when needed, rather than as part of a SPARQL query?

Geometries can be served using standard HTTP compression techniques
and will benefit from caching.

You can provide summary statistics (including size of the document,
and properties of the described area, e.g. centroids) in the RDF to
help address a few common requirements, allowing clients to only fetch
the geometries they need, as they need them.

This can greatly reduce the volume of data you have to store and
provides clients with more flexibility.

Cheers,

L.


On Mon, Sep 9, 2013 at 10:47 AM, Frans Knibbe | Geodan
frans.kni...@geodan.nl wrote:
 Hello,

 In my line of work (geographical information) I often deal with high volume
 data. The high volume is caused by single facts having a big size. A single
 2D or 3D geometry is often encoded as a single text string and can consist
 of thousands of numbers (coordinates). It is easy to see that this can cause
 performance issues with transferring and processing data. So I wonder about
 the state of the art in minimizing data volume in Linked Data. I know that
 careful publication of data will help a bit: multiple levels of detail could
 be published, coordinates could use significant digits (they almost never
 do), but it seems to me that some kind of compression is needed too. Is
 there something like a common approach to data compression at the moment?
 Something that is understood by both publishers and consumers of data?

 Regards,
 Frans





-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Open Data Rights Statements

2013-08-12 Thread Leigh Dodds
Hi,

A quick follow-up to my previous announcement. The schema and user
guides have been updated based on feedback I've received from the
wider community. I've also just published a follow-up piece of work
that I think is also relevant to this community.

This work looks at the implications of various open licences on the
creation of derived datasets. There's a blog post with pointers here:

http://theodi.org/blog/exploring-compatibility-between-data-licences

If anyone has any comments then please let me know.

Cheers,

L.

On Tue, Jul 2, 2013 at 9:23 AM, Leigh Dodds le...@ldodds.com wrote:
 Hi,

 At the UK Open Data Institute we've been working on some guidance and
 a new vocabulary to help support the publication of machine-readable
 rights statements for open data. The vocabulary builds on existing
 work in this area (e.g. Dublin Core and Creative Commons) but
 addresses a few issues that we felt were underspecified.

 The vocabulary is intended to work in a wide variety of contexts, from
 simple JSON documents and data packaging formats through to Linked
 Data and Web APIs.

 The work is now at a stage where we're keen to get wider feedback from
 the community.

 You can read a background on the work in this introductory blog post
 on the UK ODI blog:

 http://theodi.org/blog/machine-readable-rights-statements

 The draft schema can be found here:

 http://schema.theodi.org/odrs/

 And there are publisher and re-user guides to accompany it:

 https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md
 https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md

 We would love to hear your feedback on the work. If you do have issues
 or comments, then can I ask that you submit them as an issue to our
 github project:

 https://github.com/theodi/open-data-licensing/issues

 Thanks,

 L.

 --
 Leigh Dodds
 Freelance Technologist
 Open Data, Linked Data Geek
 t: @ldodds
 w: ldodds.com
 e: le...@ldodds.com



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



License LINK Headers and Linked Data

2013-08-12 Thread Leigh Dodds
Hi,

There's one aspect of my document on publishing machine-readable
rights statements that I want to flag to this community.

Specifically its the section on including references to licence and
rights statements from LINK headers in HTTP responses:

https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md#linking-to-rights-statements-from-web-apis

While that information can also be published in RDF, as part of the
Linked Data response, I think adding LINK headers is very important
too, for several reasons:

Linked Data applications and browsers will commonly encounter new
resources and the licensing information should be immediately clear.
Having this be accessible outside of the response will allow user
agents to be able to clearly detect licences before they start
retrieving data from a new source. This will allow users to place
pre-conditions on what type of data they want to
harvest/collect/process.

A HEAD request can be made on a resource to check its licensing,
before data is actually retrieved.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: License LINK Headers and Linked Data

2013-08-12 Thread Leigh Dodds
Hi Mike,

On Mon, Aug 12, 2013 at 5:34 PM, mike amundsen
michael.amund...@yahoo.com wrote:
 A HEAD request can be made on a resource to check its licensing...

 Since HEAD does not resolve the LINK URLs, agents can check for the
 *existence* of licensing information, but not necessarily determine the
 licensing context.

 If the LINK @href or one of the associated @rel values is a URI/IRI that the
 agent recognizes (knows ahead of time) then that MAY provide sufficient
 context for the agent to make a judgment on whether the representation is
 marked with an acceptable license.

 Failing that, the agent will need to deref the LINK @href and parse/process
 the response in order to make a judgment on the appropriateness of the
 licensing of the initial response.

Yes, that's exactly what I meant by check its licensing. I didn't
mean that the header itself communicated all of the necessary
information.

Thanks for spelling it out! :)

L.


-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Open Data Rights Statements

2013-07-08 Thread Leigh Dodds
Hi Bernard,

On Fri, Jul 5, 2013 at 7:12 PM, Bernard Vatant
bernard.vat...@mondeca.comwrote:

 Hello David

 Thanks for the ping, LOV lurking on public-lod anyway ...
 But since we are in public, just a reminder that the simplest way to
 suggest new vocabularies to LOV is through
 http://lov.okfn.org/dataset/lov/suggest/

 But we always of course appreciate direct conversation, and ORDS is
 definitely on the queue.

 @Leigh do you think this preliminary version is worth including in LOV as
 is (if nothing else for history) or do we wait for a more mature version?


I say go ahead and include it. I don't envisage any major changes to
structure, although we may add some new properties in future.

I'll also look at including alternate serializations to the existing Turtle
file.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com


Open Data Rights Statements

2013-07-02 Thread Leigh Dodds
Hi,

At the UK Open Data Institute we've been working on some guidance and
a new vocabulary to help support the publication of machine-readable
rights statements for open data. The vocabulary builds on existing
work in this area (e.g. Dublin Core and Creative Commons) but
addresses a few issues that we felt were underspecified.

The vocabulary is intended to work in a wide variety of contexts, from
simple JSON documents and data packaging formats through to Linked
Data and Web APIs.

The work is now at a stage where we're keen to get wider feedback from
the community.

You can read a background on the work in this introductory blog post
on the UK ODI blog:

http://theodi.org/blog/machine-readable-rights-statements

The draft schema can be found here:

http://schema.theodi.org/odrs/

And there are publisher and re-user guides to accompany it:

https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md
https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md

We would love to hear your feedback on the work. If you do have issues
or comments, then can I ask that you submit them as an issue to our
github project:

https://github.com/theodi/open-data-licensing/issues

Thanks,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Open Data Rights Statements

2013-07-02 Thread Leigh Dodds
Hi Andrea,

On Tue, Jul 2, 2013 at 11:19 AM, Andrea Perego
andrea.per...@jrc.ec.europa.eu wrote:
 That's very interesting, thank you, Leigh.

 I wonder whether you plan to consider work carried out in the framework of
 the Open Data Rights Language (ODRL) CG of W3C [1].

Yes, I'm aware of that work. ODRL is a general purpose rights
expression language that can describe re-use policies. This is similar
to the existing Creative Commons ccRel vocabulary which also captures
the permissions, etc that are described by a licence.

The ODRS vocabulary doesn't attempt to describe licenses themselves.
It's intended more of a way to annotate the relationship between a
dataset and one or more licences. Those licenses could be give a
machine-readable description using ccREL or ODRL. So I think the
vocabularies are compatible.

I've already added an issue to cover describing this relationship a little more.

 Also, do you plan to support the notion of licence type? This is being
 used, e.g., in vocabularies like ADMS.SW [2] and the DCAT-AP (DCAT
 Application Profile for EU data portals) [3].

Looking at the DCAT profile it seems that license type is a category
of license, e.g. public domain, royalties required, etc. To me, this
overlaps with what ccRel and ODRL already cover, but at a more coarse
grained level.

I think for the purposes of the ODRS vocabulary we'll leave the
description of licenses reasonably opaque and defer to other
vocabularies to describe those in more detail. However we do
distinguish between separate licenses that relate to the data and
copyrightable aspects of the dataset.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Business Models, Profitability, and Linked Data

2013-06-10 Thread Leigh Dodds
Hi,

On Fri, Jun 7, 2013 at 5:52 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 There have been a few recent threads on the LOD and Semantic
 Web mailing lists that boil down to the fundamental issues of
 profitability, business models, and Linked Data.

 Situation Analysis
 ==

 Business Model Issue
 

 The problem with Data-oriented business models is that you
 ultimately have to deal with the issue of wholesale data copying
 without attribution. That's the key issue; everything else is
 a futile dance around this concern.

Why do you think that attribution is the key issue with data oriented
businesses?

I've spoken with a number of firms who have business models based on
data supply and have never once heard attribution being mentioned as
an issue for themselves or their customers. So I'm curious why you
think this is a problem.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Business Models, Profitability, and Linked Data

2013-06-10 Thread Leigh Dodds
Hi,

On Mon, Jun 10, 2013 at 9:26 AM, Víctor Rodríguez Doncel
vrodrig...@fi.upm.es wrote:

 While attribution may not be hindering any business, it would be nice being
 able to specify in a machine readable form the way it should be made...

Yes there's definitely scope to do more there, and something I'm
working on at the moment.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Business Models, Profitability, and Linked Data

2013-06-10 Thread Leigh Dodds
Hi,

On Mon, Jun 10, 2013 at 12:00 PM, Kingsley Idehen
kide...@openlinksw.com wrote:
 On 6/10/13 4:18 AM, Leigh Dodds wrote:

 Hi,

 On Fri, Jun 7, 2013 at 5:52 PM, Kingsley Idehen kide...@openlinksw.com
 wrote:

 There have been a few recent threads on the LOD and Semantic
 Web mailing lists that boil down to the fundamental issues of
 profitability, business models, and Linked Data.

 Situation Analysis
 ==

 Business Model Issue
 

 The problem with Data-oriented business models is that you
 ultimately have to deal with the issue of wholesale data copying
 without attribution. That's the key issue; everything else is
 a futile dance around this concern.

 Why do you think that attribution is the key issue with data oriented
 businesses?

 Its the key to provenance. It's the key making all contributors to the data
 value chain visible.

I don't disagree that attribution and provenance are important,
especially for Open Data, but also whenever it becomes important to
understand sources of data.

 As I've already stated, the big problem here is wholesale copying and
 reproduction without attribution. Every data publisher has to deal with this
 problem, at some point, when crafting a data oriented business model.

Every data publisher that aggregates or collects data from other
sources certainly needs to understand -- for their own workflow --
where data originates.

 I've spoken with a number of firms who have business models based on
 data supply and have never once heard attribution being mentioned as
 an issue for themselves or their customers. So I'm curious why you
 think this is a problem.

 And are those data suppliers conforming to patterns such as those associated
 with publicly available Linked Open Data? Can they provide open access to
 data and actually have a functional business model based on the
 aforementioned style of data publication?

No they weren't using Linked Open Data. No they weren't publishing
open data (it was commercially licensed for the most part). But they
all had successful business models.

But I understood you to be making a general statement about a key
issue that is common to all data business models, one that Linked Data
then solves.

I agree that every data aggregator needs to understand their workflow,
to manage their own processes. I agree that publishing details of data
provenance and attribution is important, particularly for Open Data.
And absolutely agree that Linked Data can help there.

Maybe I'm misunderstanding your point but I'm not seeing evidence that
attribution is a key business issue that data businesses have to solve
in order to be successful. You said that everything else is a futile
dance around this concern which I found surprising, so I'm curious
about the evidence. I'm curious about the general business drivers,
regardless of whether the data is Linked or Open.

Making the data Linked is a solution; making the data Open might also
be a solution, but also presents its own challenges.

Sometimes its important to know how the sausage is made, sometimes its not.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: There's No Money in Linked Data

2013-05-18 Thread Leigh Dodds
Hi Pascal,

Its good to draw attention to these issues. At ISWC 2009 Tom Heath,
Kaitlin Thaney, Jordan Hatcher and myself ran a workshop a legal and
social issues for data sharing [1, 2]. Key themes from the workshop
were around the importance of clear licensing, norms for attribution,
and including machine-readable license data.

At the time I did a survey of the current state of licensing of the
Linked Data cloud, there's a write-up [3] and diagram [4].

Looking over your analysis, I don't think the picture has changed
considerably since then. We need to work harder to ensure that data is
clearly licensed. But this is a general problem for Open Data, not
just Linked Open Data.

You don't say in your paper how you did the analysis. Did you use the
metadata from the LOD group in datahub? [5]. At the time I had to do
mine manually, but it wouldn't be hard to automate some of this now,
perhaps to create an regularly updated set of indicators.

One criteria that agents might apply when conducting Follow Your
Nose consumption of Linked Data is the licensing of the target data,
e.g. ignore links to datasets that are not licensed for your
particular usage.

Cheers,

L.

[1]. http://opendatacommons.org/events/iswc-2009-legal-social-sharing-data-web/
[2]. http://blog.okfn.org/2009/11/05/slides-from-open-data-session-at-iswc-2009/
[3]. http://blog.ldodds.com/2010/01/01/rights-statements-on-the-web-of-data/
[4]. http://www.flickr.com/photos/ldodds/4043803502/
[5]. http://datahub.io/group/lodcloud

On Sat, May 18, 2013 at 3:15 AM, Pascal Hitzler
pascal.hitz...@wright.edu wrote:
 We just finished a piece indicating serious legal issues regarding the
 commercialization of Linked Data - this may be of general interest, hence
 the post. We hope to stimulate discussions on this issue (hence the
 provokative title).

 Available from
 http://knoesis.wright.edu/faculty/pascal/pub/nomoneylod.pdf

 Abstract.
 Linked Data (LD) has been an active research area for more than 6 years and
 many aspects about publishing, retrieving, linking, and cleaning Linked Data
 have been investigated. There seems to be a broad and general agreement that
 in principle LD datasets can be very useful for solving a wide variety of
 problems ranging from practical industrial analytics to highly specific
 research problems. Having these notions in mind, we started exploring the
 use of notable LD datasets such as DBpedia, Freebase, Geonames and others
 for a commercial application. However, it turns out that using these
 datasets in realistic settings is not always easy. Surprisingly, in many
 cases the underlying issues are not technical but legal barriers erected by
 the LD data publishers. In this paper we argue that these barriers are often
 not justified, detrimental to both data publishers and users, and are often
 built without much consideration of their consequences.

 Authors:
 Prateek Jain, Pascal Hitzler, Krzysztof Janowicz, Chitra Venkatramani

 --
 Prof. Dr. Pascal Hitzler
 Kno.e.sis Center, Wright State University, Dayton, OH
 pas...@pascal-hitzler.de   http://www.knoesis.org/pascal/
 Semantic Web Textbook: http://www.semantic-web-book.org
 Semantic Web Journal: http://www.semantic-web-journal.net





-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Summarising dbpedia country coverage

2013-05-15 Thread Leigh Dodds
Thought this might be interesting for people on here. I wrote a script
to summarise the geographic coverage of dbpedia:

http://blog.ldodds.com/2013/05/15/summarising-geographic-coverage-of-dbpedia-and-wikipedia/

Lots more potential here, both for creating proper Linked Data for the
results, and for further analysis.

What other Linked Data sets include a range of geographic locations?

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Is science on sale this week?

2013-05-14 Thread Leigh Dodds
-the-pdf+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.



 --
 Phillip Lord,   Phone: +44 (0) 191 222 7827
 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
 School of Computing Science,
 http://homepages.cs.ncl.ac.uk/phillip.lord
 Room 914 Claremont Tower,   skype: russet_apples
 Newcastle University,   twitter: phillord
 NE1 7RU





 --
 Alexander Garcia
 http://www.alexandergarcia.name/
 http://www.usefilm.com/photographer/75943.html
 http://www.linkedin.com/in/alexgarciac




-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Content negotiation negotiation

2013-04-24 Thread Leigh Dodds
The first two indicate that responses vary based on Accept header as
both have a Vary: Accept. The third doesn't so doesn't support
negotiation.

None of the URLs advertise what formats are available. That's not a
requirement for content-negotiation, although it'd be useful.

Cheers,

L.


On Wed, Apr 24, 2013 at 2:17 PM, Phillip Lord
phillip.l...@newcastle.ac.uk wrote:

 Hmmm.

 So, taking a look at these three URLs, can you tell me
 a) which of these support content negotiation, and b) what formats
 they provide.

 http://dx.doi.org/10.3390/fi4041004
 http://dx.doi.org/10.1594/PANGAEA.527932
 http://dx.doi.org/10.1000/182

 I tried vapor -- it seems to work by probing with application/rdf+xml,
 but it appears to work by probing. I can't find any of the headers
 mentioned either, although perhaps I am looking wrongly.

 Phil



 Hugh Glaser h...@ecs.soton.ac.uk writes:

 Ah of course - thanks Mark, silly me.
 So I look at the Link: header for something like
 curl -L -i http://dbpedia.org/resource/Luton
 Which gives me the information I want.

 Anyone got any offers for how I would use Linked Data to get this into my 
 RDF store?

 So then I can do things something like:
 SELECT ?type ?source FROM { http://dbpedia.org/resource/Luton ?foo ?file .
 ?file ?type ?source . }
 (I think).

 I suppose it would need to actually be returned from a URI at the site - I
 can't get a header as URI resolution - right?
 And I would need an ontology?

 Cheers.

 On 23 Apr 2013, at 19:49, Mark Baker dist...@acm.org
  wrote:

 On Tue, Apr 23, 2013 at 1:42 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:

 On 22 Apr 2013, at 12:18, Phillip Lord phillip.l...@newcastle.ac.uk 
 wrote:
 snip
 We need to check for content negotiation; I'm not clear, though, how we
 are supposed to know what forms of content are available. Is there
 anyway we can tell from your website that content negotiation is
 possible?
 Ah, and interesting question.
 I don't know of any, but maybe someone else does?

 Client-side conneg, look for Link rel=alternate headers in response

 Server-side conneg, look for Vary: Content-Type in response

 Mark.




 --
 Phillip Lord,   Phone: +44 (0) 191 222 7827
 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
 School of Computing Science,
 http://homepages.cs.ncl.ac.uk/phillip.lord
 Room 914 Claremont Tower,   skype: russet_apples
 Newcastle University,   twitter: phillord
 NE1 7RU




-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: SPARQL, philosophy n'stuff..

2013-04-22 Thread Leigh Dodds
Hi Barry,

On Mon, Apr 22, 2013 at 9:17 AM, Barry Norton barry.nor...@ontotext.com wrote:

 I'm sorry, but you seem to have misunderstood the use of a graph URI
 parameter in indirect graph addressing for GSP.

 I wish all GSP actions addressed graphs directly, Queries were all GETs, and
 that Updates were all PATCH documents, but a degree of pragmatism has been
 applied.

I think Mark's point was that SPARQL 1.1/GSP specify a fixed query
parameter (query, graph) in the specification, requiring clients to
construct URIs rather than using hypermedia.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi Hugh,

On Thu, Apr 18, 2013 at 10:56 AM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 (Yes, Linked Data API is cool!, and thanks for getting back to the main 
 subject, although I somehow doubt anyone is expecting to read anything about 
 it in this thread now :-) )

I'm still hoping we might return to the original topic :)

What this discussion, and in fact most related discussions about
SPARQL as a web service, seems to overlook is that there are several
different issues in play here:

* Whether SPARQL is more accessible to developers than other forms of
web API. For example is the learning curve, harder or easier?

* Whether offering query languages like SPARQL, SQL, YQL, etc is a
sensible option when offering a public API and what kinds of quality
of service can be wrapped around that. Or do other forms of API offer
more options for providing quality of service by trading off power of
query expression?

* Techniques for making SPARQL endpoints scale in scenarios where the
typical query patterns are unknown (which is true of most public
endpoints). Scaling and quality of service considerations for a public
web service and a private enterprise endpoint are different. Not all
of the techniques that people use, e.g. query timeouts or partial
results, are actually standardised so plenty of scope for more
exploration here.

* Whether SPARQL is the only query language we need for RDF, or for
more general graph databases, or whether there are room for other
forms of graph query languages

The Linked Data API was designed to provide a simplified read-only API
that is less expressive than full SPARQL. The goals were to make
something easier to use, but not preclude helping developers towards
using full SPARQL if that's what they wanted. It also fills a
short-fall with most Linked Data publishing approaches, i.e. that
getting lists of things, possibly as a paged list, possibly with some
simple filtering is not easy. We don't need a full graph query
language for that. The Linked Data Platform is looking at that area
too, but its also got a lot more requirements its trying to address.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi Paul,

On Thu, Apr 18, 2013 at 11:54 AM, Paul Groth p.t.gr...@vu.nl wrote:
 Hi Leigh

 The problem is that it's really easy to write sparql queries that are
 inefficient when you don't know the data [1] and even when you do the
 flexibility of sparql means that people can easily end-up writing complex
 hard to process queries.

Totally agree with your assessment, I was just observing that there's
a number of factors in play which result in a design trade-off meaning
there is no right answer or winning solution.

My experience is much the same as yours. Which is why I've been
experimenting with APIs over SPARQL and worked with Jeni and Dave on
the design of the Linked Data API. I think its pretty good, but don't
think we've done a good job yet of documenting it. I also suspect
there's an even simpler subset or profile in there, but I've not had
the time yet to dig through and see what kinds of APIs people are
building with it.

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 12:01 PM, Luca Matteis lmatt...@gmail.com wrote:
 Thanks Paul,

 That is exactly what my point was entirely about. Many service don't expose
 their SQL interface, so why should Linked Data?

 Regarding this Linked Data API, it seems to still require a SPARQL endpoint.
 In fact it states that it is a proxy for SPARQL. Would it simply be possible
 to implement this API without SPARQL on top of a regular database that
 contains triples?

While the specification talks about mapping to a SPARQL endpoint the
processing model would potentially allow you to use different
backends. Servicing a Linked Data API request involves several steps:

1. Mapping the request to a query (currently a SPARQL SELECT) to
identify the list of resources of interest
2. Mapping the request to a query (currently a SPARQL CONSTRUCT) to
produce a description of each item on the list
3. Serialising the results

Broadly speaking you could swap out steps 1  2.

For example you could map the first step to a search query that
produces a list of results from a search engine, or a SQL query that
extracts the resources from a database. You could map the second step
to requests to a document database that fetches pre-existing
descriptions of each item.

The API supports a number of filtering and sorting options, which will
add some complexity to both stages, but I don't think there's any show
stoppers in there.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: SPARQL, philosophy n'stuff..

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 12:21 PM, Jürgen Jakobitsch SWC
j.jakobit...@semantic-web.at wrote:
 i think there's yet another point overlooked :

 what we are trying to do is to create barrier free means of
 communication on data level in a globalized world. this effort requires
 a common language.

Did you mean a common *query* language?

I'm not sure I agree. Mainly because no-one has yet created such as
thing, so we might find out that the bigger challenges are elsewhere.
I guess time will tell :)

I used to think that there might be convergence around common query
languages for APIs, but there's little evidence of that happening.

 my personal view is that providing simplier subsets of such a language
 (an api) only leads to the fact that nobody will learn the language (see
 pocket calculators,...), although there's hardly anything easier than to
 write a sparql query, it can be learned in a day.

 i do not really understand where this the developer can't sparql, so
 let's provide something similar (easier) - idea comes from.

Well if our goal is to create barrier free data sharing and re-use
then we should focus on achieving that regardless of technology, and
should be open to a variety of approaches. We can't decide that SPARQL
is the right solution and then just expect everyone to learn it.

Maybe it only takes a day to learn SPARQL, but personally I find that
usually I can get up to speed with a custom API in a few minutes, so
that's even faster.

And it turns out that often the issue isn't just learning SPARQL
alone, its also learning the data model [1].

 did anyone provide me with a wrapper for the english language? nope, had
 to learn it.

But I bet you learnt it in stages using a pedagogical approach that
guided you towards the basic building blocks first. And I expect there
were other reasons -- network effects -- why learning English was
worth up-front effort. We're not there with SPARQL.

Cheers,

L.

[1]. http://blog.ldodds.com/2011/06/16/giving-rdf-datasets-more-affordance/

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 4:23 PM, Alan Ruttenberg
alanruttenb...@gmail.com wrote:
 Luca,

 In the past I have suggested a simple way to create simple restful services
 based on SPARQL. This could easily be implemented as an extension to your
 beginning of restpark.

 The idea is to have the definition of a service be a sparql query with
 blanks, and possibly some extra annotations.

That's essentialy what we called SPARQL Stored Procedures in Kasabi.
SPARQL queries bound to URIs with parameters injected from query
string. We also had transformation of results using XSLT. Swirrl have
implemented this as named queries [1], and I used their name when
writing up the pattern [2].

One set of annotations I'm planning on adding to sparql-doc are the
parameters that need to be injected and, optionally, a path to bind
the query to when mounted. The goal being to allow a package of
queries to be mounted at a URL and used as named queries.

[1]. http://blog.swirrl.com/articles/new-publishmydata-feature-named-queries
[2]. http://patterns.dataincubator.org/book/named-query.html
[3]. http://blog.ldodds.com/2013/01/30/sparql-doc/

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Coping with gaps in linked data (UK postcodes)?

2013-04-12 Thread Leigh Dodds
Hi Stephen,

Really your only option is to mint your own URIs, but then later build
in links to the official URIs if/when they become available:

http://patterns.dataincubator.org/book/proxy-uris.html

The postcodes make good Natural Keys for building your URIs. This will
help to automatically generate links not just to your data, but also
to the official version when available.

Cheers,

L.


On Fri, Apr 12, 2013 at 2:08 PM, Cresswell, Stephen
stephen.cressw...@tso.co.uk wrote:

 Hello,

 In our application, we wish to publish linked data, including addresses
 with postcode URIs.  The postcode URIs provided by Ordnance Survey for
 England, Wales and Scotland are really useful, with the postcode URIs
 dereferencing to provide useful information including co-ordinates.

 However, the geographical extent of our data includes Northern Ireland,
 which is outside the scope of (British) Ordnance Survey and not included
 in their dataset.  The equivalent postcode data for Northern Ireland is
 available from the NI government body NISRA, but it is not on an open
 license.

 This leaves us with a question about what URIs to use for Northern
 Ireland postcodes, as we know of no existing URI scheme for Northern
 Ireland postcodes.

 If we generate postcode URIs using the same pattern as the rest of the
 UK, those URIs would be in the Ordnance Survey's domain, but NI
 postcodes are not actually in their dataset and they won't dereference,
 so that seems wrong.

 If we are to have dereferencable URIs, we would presumably have to host
 them in our own domain, which is definitely not the most appropriate
 place for them to be.  If we buy a license to use the NI postcode data,
 we still wouldn't be able to republish it as linked data.  Presumably,
 however, there is some geographical information that is open and could
 be published, e.g. courser geographical information based on just the
 postcode district.

 Does anyone have any advice on best practice, either for the specific
 problem (NI postcodes) or for the general problem of how to cope with
 URIs based on an existing coding scheme (e.g. postcodes), where the
 published URIs don't cover all of the original codes?

 Stephen Cresswell
 The Stationery Office


 This email is confidential and may also be privileged and/or proprietary to 
 The Stationery Office Limited. It may be read, copied and used only by the 
 intended recipient(s). Any unauthorised use of this email is strictly 
 prohibited. If you have received this email in error please contact us 
 immediately and delete it and any copies you have made. Thank you for your 
 cooperation.
 The Stationery Office Limited is registered in England under Company No. 
 3049649 at 1-5 Poland Street, London, W1F 8PR






-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Content negotiation for Turtle files

2013-02-06 Thread Leigh Dodds
Hi,

On Wed, Feb 6, 2013 at 9:54 AM, Bernard Vatant
bernard.vat...@mondeca.com wrote:
 ...
 But what I still don't understand is the answer of Vapour when requesting
 RDF/XML :

 1st request while dereferencing resource URI without specifying the desired
 content type (HTTP response code should be 303 (redirect)): Passed
 2nd request while dereferencing resource URI without specifying the desired
 content type (Content type should be 'application/rdf+xml'): Failed
 2nd request while dereferencing resource URI without specifying the desired
 content type (HTTP response code should be 200): Passed

From a purely HTTP and Content Negotiation point of view, if a client
doesn't specify an Accept header then its perfectly legitimate for a
server to return a default format of its choosing. I think it could
also decide to serve a 300 status code and prompt the client to choose
an option thats available.

From an interoperability point of view, having a default format that
clients can rely on is reasonable. Until now, RDF/XML has been the
standardised format that we can all rely on, although shortly we may
all collectively decide to prefer Turtle. So ensuring that RDF/XML is
available seems like a reasonable thing for a validator to try and
test for.

But there's several ways that test could have been carried out. E.g.
Vapour could have checked that there was a RDF/XML version and
provided you with some reasons why that would be useful. Perhaps as a
warning, rather than a fail.

The explicit check for RDF/XML being available AND being the default
preference of the server is raising the bar slightly, but its still
trying to aim for interop.

Personally I think I'd implement this kind of check as ensure there
is at least one valid RDF serialisation available, either RDF/XML or
Turtle. I wouldn't force a default on a server, particularly as we
know that many clients can consume multiple formats.

This is where automated validation tools have to tread carefully:
while they play an excellent role in encouraging consistently, the
tests they perform and the feedback they give need to have some
nuance.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Linked Data Adoption Challenges Poll

2012-09-13 Thread Leigh Dodds
Hi,

You might need to clarify your questions, I can have a guess at what
they mean, but they may not be right.

Presumably you are also targeting this poll at people trying (and
failing) to adopt linked data. In that case you might want to broaden
the base of potential respondents. People on these lists may not
reflect all of the issues.

Cheers,

L.

On Thu, Sep 13, 2012 at 5:34 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 All,

 I've created a poll oriented towards capturing data about issues that folks
 find most challenging re., Linked Data Adoption.

 Please cast your vote as the results will be useful to all Linked Data
 stakeholders.

 Link: http://poll.fm/3w0cb .

 --

 Regards,

 Kingsley Idehen
 Founder  CEO
 OpenLink Software
 Company Web: http://www.openlinksw.com
 Personal Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca handle: @kidehen
 Google+ Profile: https://plus.google.com/112399767740508618350/about
 LinkedIn Profile: http://www.linkedin.com/in/kidehen








-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Can we create better links by playing games?

2012-06-20 Thread Leigh Dodds
On Wed, Jun 20, 2012 at 2:19 PM, Melvin Carvalho
melvincarva...@gmail.com wrote:


 On 20 June 2012 15:11, Kingsley Idehen kide...@openlinksw.com wrote:

 On 6/19/12 3:23 PM, Martin Hepp wrote:

 [1] Games with a Purpose for the Semantic Web, IEEE Intelligent Systems,
 Vol. 23, No. 3, pp. 50-60, May/June 2008.


 Do the games at: http://ontogame.sti2.at/games/, still work? The more data
 quality oriented games the better re. LOD and the Semantic Web in general.

 Others: Are there any other games out there?


 iand is working on a game:

 http://blog.iandavis.com/2012/05/21/wolfie/

Is that relevant? :)

L.



New draft of Linked Data Patterns book

2012-06-01 Thread Leigh Dodds
Hi,

There's a new draft of the Linked Data patterns book available:

http://patterns.dataincubator.org/book/

There have been a number of revisions across the pattern catalogue,
including addition of new introductory sections to each chapter. There
are a total of 12 new patterns, many of which cover data management
patterns relating to use of named graphs.

Cheers,

L.



Re: Decommissioning a linked data site

2012-06-01 Thread Leigh Dodds
Hi,

On Fri, Jun 1, 2012 at 7:34 AM, Antoine Isaac ais...@few.vu.nl wrote:
 @Tim:

 For total extra kudos, provide query rewriting rules
 from yours site to LoC data, linked so that you can write a program
 to start with a sparql query which fails
 and figures out from metadata how to turn it into one which works!


 Is the combination of 301 + owl:sameAs that we have used for RAMEAU, e.g,
 http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11932889r
 good enough?
 Or would you recommend more/different?

I've started to capture some advice here:

http://patterns.dataincubator.org/book/unpublish.html

Cheers,

L



Re: Decommissioning a linked data site

2012-06-01 Thread Leigh Dodds
Hi,

On Fri, Jun 1, 2012 at 3:30 PM, Bradley Allen bradley.p.al...@gmail.com wrote:
 Leigh- This is great. The question that comes up for me out of what you've
 written for unpublishing brings me back to Antoine's question: is it
 appropriate to use a relation other than owl:sameAs that more specific to
 the domain of the affected datasets being mapped, or is the nature of
 unpublishing such that one would, as opposed to my reasoning earlier, be as
 broad as possible in asserting equivalence, and use owlsameAs in every such
 case?

Really interesting question, and this might prompt me to revise the pattern :)

So, generally, I advocate using the appropriate equivalence relation
that relates to a specific domain. As I wrote in [1] its best to use
the most appropriate equivalence link, as they have varying semantics.

But for the unpublishing use case I think I'd personally lean towards
*always* using owl:sameAs at least in the case where we are returning
a 301 status code. I've previously come to the conclusion [2] that a
301 implies a sameAs statement. The intent seems very similar to a
sameAs. Rewriting local links to use a new location is very similar to
smushing descriptions in an RDF dataset such that statements only
relate to the new URI.

However I can see arguments to the effect that the new authority might
have a slightly different definition of a resource than the original
publisher, such that an owl:sameAs might be inappropriate. That's why
I left the advice in the pattern slightly open ended: I think it may
need to be evaluated on a case by case basis, but owl:sameAs seems
like a good workable default to me.

Cheers,

L.

[1]. http://patterns.dataincubator.org/book/equivalence-links.html
[2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/



Re: looking for skos vocabularies

2012-05-17 Thread Leigh Dodds
Hi,

There's a pretty comprehensive set of links available here:

http://www.w3.org/2001/sw/wiki/SKOS/Datasets

Cheers,

L.

On Thu, May 17, 2012 at 4:43 PM, Christian Morbidoni
christian.morbid...@gmail.com wrote:
 Hi,

 I've been looking for same example of skos vocabulary to use as a real world
 test case in a project.
 Surprisingly I cannot find so much around...do someone know about an archive
 of skos vocabularies or some good example of skos in use?
 I'm starting to wonder...is people using skos out there?

 best,

 Christian




Layered Data

2012-05-04 Thread Leigh Dodds
Hi,

I've written up some thoughts on considering datasets as layers that
can be combined to create useful aggregations. The concept originated
with Dan Brickley and I see the RDF WG are considering the term as an
alternative to named graph. My own usage is more general. I thought
I'd share a link here to see what people thought.

The paper is at:

http://ldodds.com/papers/layered-data.html

And a blog post with some commentary here:

http://www.ldodds.com/blog/2012/05/layered-data-a-paper-some-commentary/

Cheers,

L.



Re: Layered Data

2012-05-04 Thread Leigh Dodds
Hi Pablo,

On Fri, May 4, 2012 at 10:37 AM, Pablo Mendes pablomen...@gmail.com wrote:

 Interesting thoughts. It would be nice to have some default widely
 accepted facets within an extensible model.

Thanks.

 I had a somewhat related discussion with Niko Popitsch last year on how
 database views could look like in the LOD world. The discussion was a
 follow up to his talk:
 Keep Your Triples Together: Modeling a RESTtful, Layered Linked Data Store
 http://cs.univie.ac.at/research/research-groups/multimedia-information-systems/publikation/infpub/2910/

Thanks for the pointer, I'll take a look :)

Cheers,

L.



Re: Datatypes with no (cool) URI

2012-04-04 Thread Leigh Dodds
(apologies if this is a re-post, I don't think it made it through y'day)

Hi

On Tue, Apr 3, 2012 at 6:29 PM, Dave Reynolds dave.e.reyno...@gmail.com wrote:
 On 03/04/12 16:38, Sarven Capadisli wrote:

 On 12-04-03 02:33 PM, Phil Archer wrote:

 I'm hoping for a bit of advice and rather than talk in the usual generic
 terms I'll use the actual example I'm working on.

 I want to define the best way to record a person's sex (this is related
 to the W3C GLD WG's forthcoming spec on describing a Person [1]). To
 encourage interoperability, we want people to use a controlled
 vocabulary and there are several that cover this topic.
...

 Perhaps I'm looking at your problem the wrong way, but have you looked
 at the SDMX Concepts:

 http://purl.org/linked-data/sdmx/2009/code#sex

 -Sarven


 I was going to suggest that :)

+1. A custom datatype doesn't seem correct in this case. Treating
gender as a category/classification captures both the essence that
there's more than one category  that people may differ in how they
would assign classifications.

I wrote a bit about Custom Datatypes here:

http://patterns.dataincubator.org/book/custom-datatype.html

This use case aside, there ought to be more information to guide
people towards how to do this correctly.

See also:

http://www.w3.org/TR/swbp-xsch-datatypes/

Cheers,

L.



Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Leigh Dodds
Hi,

On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees r...@mumble.net wrote:
 ...
 There is a difference, since what is described could be an IR that
 does not have the description as content. A prime example is any DOI,
 e.g.

 http://dx.doi.org/10.1371/journal.pcbi.1000462

 (try doing conneg for RDF). The identified resource is an IR as you
 suggest, but the representation (after the 303 redirect) is not its
 content.

A couple of comments here:

1. Its not any DOI. I believe CrossRef are still the only registrar
that support this, but I might have missed an announcement. That's
still 50m DOIs though

2. Are you sure its an Information Resource? The DOI handbook [1]
notes that while typically used to identify intellectual property a
DOI can be used to identify anything. The CrossRef guidelines [2]
explain that [a]s a matter of current policy, the CrossRef DOI
identifies the work, not its various potential manifestations

Is a FRBR work an Information Resource? Personally I'd say not, but
others may disagree. But as Dan Brickley has noted elsewhere in the
discussion, there's other nuances to take into account.

[1]. http://www.doi.org/handbook_2000/intro.html#1.6
[2]. http://crossref.org/02publishers/15doi_guidelines.html

Cheers,

L.



Re: Document Action: 'The Hypertext Transfer Protocol (HTTP) Status Code 308 (Permanent Redirect)' to Experimental RFC (draft-reschke-http-status-308-07.txt)

2012-03-27 Thread Leigh Dodds
Hi James,

On Tue, Mar 27, 2012 at 2:15 AM, James Leigh ja...@3roundstones.com wrote:
 Could this 308 (Permanent Redirect) give us a way to cache a probe URI's
 definition document location?

 An issue people have with httpRange-14 is that 303 redirects can't be
 cached. If we could agree to use a 308 response as a cache-able
 alternative to 303, we could reduce server load and speed client URI
 processing (by caching the result of a probe URI).

I'm missing how that would help, could you elaborate? The semantics of
that response code is that the resource has permanently moved, that
seems very different to a 303.

A strict reading and application of the rules would suggest that the
new URI should be considered a replacement of the original, so sameAs,
rather than a description of.

L.



Re: Change Proposal for HttpRange-14

2012-03-26 Thread Leigh Dodds
Hi Tim,

On Sun, Mar 25, 2012 at 8:26 PM, Tim Berners-Lee ti...@w3.org wrote:
 ...
 For example, To take an arbitrary one of the trillions out there, what does
 http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108pageno=11
  identify, there being no RDF in it?
 What can I possibly do with that URI if the publisher has not explicitly
 allowed me to use it
 to refer to the online book, under your proposal?

You can do anything you want with it. You could use record statements
about your HTTP interactions, e.g. retrieval status  date. Or,
because RDF lets anyone, say anything, anywhere, you could just decide
to use that as the URI for the book and annotate it accordingly. The
obvious caveat and risk is that the publisher might subsequently
disagree with you if they do decide to publish some RDF. I can re-use
your data if I decide that risk is acceptable and we can still
usefully interact.

Even if Gutenberg.org did publish some RDF at that URI, you still have
the risk that they could change their mind at a later date.
httprange-14 doesn't help at all there. Lack of precision and
inconsistency is going to be rife whatever form the URIs or response
codes used.

Encouraging people to say what their URIs refer to is the very first
piece of best practice advice.

L.



Re: Middle ground change proposal for httpRange-14

2012-03-26 Thread Leigh Dodds
Hi David,

On Sun, Mar 25, 2012 at 6:50 PM, David Wood da...@3roundstones.com wrote:
 Hi David,

 *sigh*.  I said recently that I would rather chew my arm off than re-engage 
 with http-range-14.  Apparently I have very little self control.

 On Mar 25, 2012, at 11:54, David Booth wrote:
 Jeni, Ian, Leigh, Nick, Hugh, Steve, Masahide, Gregg, Niklas, Jerry,
 Dave, Bill, Andy, John, Ben, Damian, Thomas, Ed Summers and Davy,

 I have drafted what I think may represent a middle ground change
 proposal and I am wondering if something along this line would also meet
 your concerns:
 http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol


 Highlights of this proposal:
 - It enables a URI owner to unambiguously convey any URI definition to
 an interested client.

 +1 to this.  I have long been a fan of unambiguous definition.  The summary 
 argument against is Leigh Dodd's
 show what is actually broken approach and the summary argument for is my 
 we need to invent new ways to associate RDF
 with other Web resources in a discoverable manner to allow for 
 'follow-your-nose' across islands of Linked Data.

I may be misreading you here, but I'm not against unambiguous
definition. My show what is actually broken comment (on twitter) was
essentially the same question as I've asked here before, and as Hugh
asked again recently: what applications currently rely on httprange-14
as it is written today. That useful so we can get a sense of what
would break with a change. So far there's been 2 examples I think.

That's in contrast to a lot of publisher data (but granted, not yet
quantified as to how much) that breaks the rules of httprange-14. I'd
prefer to fix that even if at the cost of breaking a few apps. But we
all know there are very, very few apps that consume Linked Data today,
so changing client expectations isn't a massive problem.

Identifying a set of publishing patterns that identify how publishers
can reduce ambiguity, and advice for clients on how to tread carefully
in the face of ambiguity and inconsistency is a better starting point
IMHO. The goal there being to encourage more unambiguous publishing of
data, by demonstrating value at every step.

Cheers,

L.



Re: What would break? Re: httpRange-14

2012-03-26 Thread Leigh Dodds
Hi Kingsley,

On Mon, Mar 26, 2012 at 6:38 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 ...
 Leigh,

 Everything we've built in the Linked Data realm leverages the findings of
 HttpRange-14 re. Name/Address (Reference/Access) disambiguation. Our Linked
 Data clients adhere to these findings. Our Linked Data servers do the same.

By we I assume you mean OpenLink. Here's where I asked the original
question [1]. Handily Ian Davis published an example resource that
returns a 200 OK when you de-reference it [2].

I just tested that in URI Burner [3] and it gave me broadly what I'd
expect, i.e. the resources mentioned in the resulting RDF. I didn't
see any visible breakage. Am I seeing fall-back behaviour?

To answer your other question, I do understand the benefits that can
acrue from having separate URIs for a resource and its description. I
also see arguments for not always requiring both.

As a wider comment and question to the list, I'll freely admit that
what I've always done when fetching Linked Data is let my HTTP library
just follow redirects. Not to deal with 303s specifically, but because
that's just good user agent behaviour.

I've always assumed that everyone else does the same. But maybe I'm
wrong or in the minority.

Are people really testing status codes and changing subsequent
processing behaviour because of that? It looks like there's little or
no breakage in Sindice for example [3].

Based on Tim's comments he has been doing that, are other people doing
the same? And if you have to ask if we're not, then who is this ruling
benefiting?

Tim, could you share more about what application behaviour your
inferences support? Are those there to support specific features for
users?

Cheers,

L.

[1]. http://www.mail-archive.com/public-lod@w3.org/msg06735.html
[2]. http://iandavis.com/2010/303/toucan
[3]. 
http://linkeddata.uriburner.com/about/html/http/iandavis.com/2010/303/toucan
[4]. http://www.mail-archive.com/public-lod@w3.org/msg06746.html



Re: What would break? Re: httpRange-14

2012-03-26 Thread Leigh Dodds
Hi,

On Mon, Mar 26, 2012 at 7:59 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 On 3/26/12 2:09 PM, Leigh Dodds wrote:

 Hi Kingsley,

 On Mon, Mar 26, 2012 at 6:38 PM, Kingsley Idehenkide...@openlinksw.com
  wrote:

 ...
 Leigh,

 Everything we've built in the Linked Data realm leverages the findings of
 HttpRange-14 re. Name/Address (Reference/Access) disambiguation. Our
 Linked
 Data clients adhere to these findings. Our Linked Data servers do the
 same.

 By we I assume you mean OpenLink. Here's where I asked the original
 question [1]. Handily Ian Davis published an example resource that
 returns a 200 OK when you de-reference it [2].

 Support was done (basically reusing our old internal redirection code)
 whenever that post was made by Ian.


 I just tested that in URI Burner [3] and it gave me broadly what I'd
 expect, i.e. the resources mentioned in the resulting RDF. I didn't
 see any visible breakage. Am I seeing fall-back behaviour?


 As per comment above its implemented. We have our own heuristic for handling
 self-describing resources. My concern is that what we've done isn't the norm
 i.e., I don't see others working that way, instinctively. You have to be
 over the Linked Data comprehension hump to be in a position emulate what
 we've done.

OK, I thought you might have done, so thanks for the confirmation. But
this further demonstrates that we don't necessarily need redirects.

 
 Are people really testing status codes and changing subsequent
 processing behaviour because of that? It looks like there's little or
 no breakage in Sindice for example [3].

 Based on Tim's comments he has been doing that, are other people doing
 the same? And if you have to ask if we're not, then who is this ruling
 benefiting?

 We do the same, but we also go beyond (i.e., what you call a fall-back).

Would you care to elaborate on that? i.e: what inferences are you
deriving from the protocol interaction?

I can see that for a .txt document you are inferring that its a
foaf:Document [1].

I'm still also interested to hear from others.

[1]. 
http://linkeddata.uriburner.com/about/html/http/www.gutenberg.org/files/76/76.txt

Cheers,

L.



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi,

On 19 October 2011 23:10, Jonathan Rees j...@creativecommons.org wrote:
 On Wed, Oct 19, 2011 at 5:29 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi Jonathan

 I think what I'm interested in is what problems might surface and
 approaches for mitigating them.

 I'm sorry, the writeup was designed to do exactly that. In the example
 in the conflict section, a miscommunication (unsurfaced
 disagreement) leads to copyright infringement. Isn't that a problem?

Yes it is, and these are the issues I think that are worth teasing out.

I'm afraid though that I'll have to admit to not understanding your
specific example. There's no doubt some subtlety that I'm missing (and
a rotten head cold isn't helping). Can you humour me and expand a
little? The bit I'm struggling with is:

[[[
http://example/x xhv:license
   http://creativecommons.org/licenses/by/3.0/.

According to D2, this says that document X is licensed. According to
S2, this says that document Y is licensed
]]]

Taking the RDF data at face value, I don't see how the D2 and S2
interpretations differ. Both say that http://example/x has a
specific license. How could an S2 assuming client, assume that the
data is actually about another resource?

I looked at your specific examples, e.g. Flickr and Jamendo:

The RDFa extracted from the Flickr photo page does seem to be
ambiguous. I'm guessing the intent is to describe the license of the
photo and not the web page. But in that case, isn't the issue that
Flickr aren't being precise enough in the data they're returning?

The RDFa extracted from the Jamendo page including type information
(from the Open Graph Protocol) that says that the resource is an
album, and has a specific Creative Commons license. I think that's
what's intended isn't it?

Why does a client have to assume a specific stance (D2/S2). Why not
simply takes the data returned at face value? It's then up to the
publisher to be sure that they're making clear assertions.

 There is no heuristic that will tell you which of the two works is
 licensed in the stated way, since both interpretations are perfectly
 meaningful and useful.

 For mitigation in this case you only have a few options
 1. precoordinate (via a disambiguating rule of some kind, any kind)
 2. avoid using the URI inside ... altogether - come up with distinct
 wads of RDF for the 2 documents
 3. say locally what you think ... means, effectively treating these
 URIs as blank nodes

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi,

On 20 October 2011 13:25, Ed Summers e...@pobox.com wrote:
 On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds leigh.do...@talis.com wrote:
 So, can we turn things on their head a little. Instead of starting out
 from a position that we *must* have two different resources, can we
 instead highlight to people the *benefits* of having different
 identifiers? That makes it more of a best practice discussion and one
 based on trade-offs: e.g. this class of software won't be able to
 process your data correctly, or you'll be limited in how you can
 publish additional data or metadata in the future.

 I don't think I've seen anyone approach things from that perspective,
 but I can't help but think it'll be more compelling. And it also has
 the benefits of not telling people that they're right or wrong, but
 just illustrate what trade-offs they are making.

 I agree Leigh. The argument that you can't deliver an entity like a
 Galaxy to someone's browser sounds increasingly hollow to me. Nobody
 really expects that, and the concept of a Representation from
 WebArch/REST explains it away to most technical people. Plus, we now
 have examples in the wild like OpenGraphProtocol that seem to be
 delivering drinks, politicians, hotels, etc to machine agents at
 Facebook just fine.

It's the arrival of the OpenGraphProtocol which I think warrants a
more careful discussion. It seems to me that we no longer have to try
so hard to convince people that giving things de-referencable URIs
that return useful data. It's happening now, and there's immediate and
obvious benefit, i.e. integration with facebook, better searching
ranking, etc.

 But there does seem to be a valid design pattern, or even refactoring
 pattern, in httpRange-14 that is worth documenting.

Refactoring is how I've been thinking about it too. i.e. under what
situations might you want to have separate URIs for its resource and
its description? Dave Reynolds has given some good examples of that.

 Perhaps a good
 place would be http://patterns.dataincubator.org/book/? I think
 positioning httpRange-14 as a MUST instead of a SHOULD or MAY made a
 lot of sense to get the LOD experiment rolling. It got me personally
 thinking about the issue of identity in a practical way as I built web
 applications, that I probably wouldn't otherwise have otherwise done.
 But it would've been easier if grappling with it was optional, and
 there were practical examples of where it is useful, instead of having
 it be an issue of dogma.

My personal viewpoint is that it has to be optional, because there's
already a growing set of deployed examples of people not doing it (OGP
adoption), so how can we help those users understand the pitfalls
and/or the benefits of a slightly cleaner approach. We can also help
them understand how best to publish data to avoid mis-interpretation.

Simplify ridiculously just to make a point, we seem to have the
following situation:

* Create de-referencable URIs for things. Describe them with OGP
and/or Schema.org
Benefit: Facebook integration, SEO

* Above plus addition # URIs or 303s.
Benefit: ability to make some finer-grained assertions in some
specific scenarios. Tabulator is happy

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi Dave,

Thanks for the response, there's some good examples in there. I'm glad
that this thread is bearing fruit :)

I had a question about one aspect, please excuse the clipping:

On 20 October 2011 10:34, Dave Reynolds dave.e.reyno...@gmail.com wrote:
 ...
 If you have two resources and later on it turns out you only needed one,
 no big deal just declare their equivalence. If you have one resource
 where later on it turns out you needed two then you are stuffed.

Ed referred to refactoring. So I'm curious about refactoring from a
single URI to two. Are developers necessarily stuffed, if they start
with one and later need two?

For example, what if I later changed the way I'm serving data to add a
Content-Location header (something that Ian has raised in the past,
and Michael has mentioned again recently) which points to the source
of the data being returned.

Within the returned data I can include statements about the document
at that URI referred to in the Content-Location header.

Doesn't that kind of refactoring help?

Presumably I could also just drop in a redirect and adopt the current
303 pattern without breaking anything?

Again, I'm probably missing something, but I'm happy to admit
ignorance if that draws out some useful discussion :)

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi,

On 20 October 2011 23:19, Kingsley Idehen kide...@openlinksw.com wrote:
 On 10/20/11 5:31 PM, Dave Reynolds wrote:

 What's more I really don't think the issues is about not understanding
 about the distinction (at least in the clear cut cases). Most people I
 talk to grok the distinction, the hard bit is understanding why 303
 redirects is a sensible way of making it and caring about it enough to
 put those in place.

 What about separating the concept of indirection from its actual
 mechanics? Thus, conversations about benefits will then have the freedom to
 blossom.

 Here's a short list of immediately obvious benefits re. Linked Data (at any
 scale):

 1. access to data via data source names -- millions of developers world wide
 already do this with ODBC, JDBC, ADO.NET, OLE DB etc.. the only issue is
 that they are confined to relational database access and all its
 shortcomings

 2. integration of heterogeneous data sources -- the ability to coherently
 source and merge disparately shaped data culled from a myriad of data
 sources (e.g. blogs, wikis, calendars, social media spaces and networks, and
 anything else that's accessible by name or address reference on a network)

 3. crawling and indexing across heterogeneous data sources -- where the end
 product is persistence to a graph model database or store that supports
 declarative query language access via SPARQL (or even better a combination
 of SPARQL and SQL)

 4. etc...

 Why is all of this important?
 Data access, integration, and management has been a problem that's straddled
 every stage of computer industry evolution. Managers and end-users always
 think about data conceptually, but continue to be forced to deal with
 access, integration, and management in application logic oriented ways. In a
 nutshell, applications have been silo vectors forever, and in doing so they
 stunt the true potential of computing which (IMHO) is ultimately about our
 collective quests for improved productivity.

 No matter what we do, there are only 24 hrs in a day. Most humans taper out
 at 5-6 hrs before physiological system faults kick in, hence our implicit
 dependency of computers for handling voluminous and repetitive tasks.

 Are we there yet?
 Much closer that most imagine. Our biggest hurdle (as a community of Linked
 Data oriented professionals) is a protracted struggle re. separating
 concepts from implementation details. We burn too much time fighting
 implementation details oriented battles at the expense of grasping core
 concepts.

Maybe I'm wrong but I think people, especially on this list,
understanding the overall benefits you itemize. The reason we talk
about implementation details is they're important to help people adopt
the technology: we need specific examples.

We get the benefits you describe from inter-linked dereferenceable
URIs, regardless of what format or technology we use to achieve it.
Using the RDF model brings additional benefits.

What I'm trying to draw out in this particular thread is specific
benefits the #/303 additional abstraction brings. At the moment, they
seem pretty small in comparison to the fantastic benefits we get from
data integrated into the web.

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi,

On 21 October 2011 08:47, Dave Reynolds dave.e.reyno...@gmail.com wrote:
 ...
 On 20 October 2011 10:34, Dave Reynoldsdave.e.reyno...@gmail.com  wrote:

 ...
 If you have two resources and later on it turns out you only needed one,
 no big deal just declare their equivalence. If you have one resource
 where later on it turns out you needed two then you are stuffed.

 Ed referred to refactoring. So I'm curious about refactoring from a
 single URI to two. Are developers necessarily stuffed, if they start
 with one and later need two?

 For example, what if I later changed the way I'm serving data to add a
 Content-Location header (something that Ian has raised in the past,
 and Michael has mentioned again recently) which points to the source
 of the data being returned.

 Within the returned data I can include statements about the document
 at that URI referred to in the Content-Location header.

 Doesn't that kind of refactoring help?

 Helps yes, but I don't think it solves everything.

 Suppose you have been using http://example.com/lovelypictureofm31 to denote
 M31. Some data consumers use your URI to link their data on M31 to it. Some
 other consumers started linking to it in HTML as an IR (because they like
 the picture and the accompanying information, even though they don't care
 about the RDF). Now you have two groups of users treating the URI in
 different ways. This probably doesn't matter right now but if you decide
 later on you need to separate them then you can't introduce a new URI
 (whether via 303 or content-location header) without breaking one or other
 use. Not the end of the world but it's not a refactoring if the test cases
 break :)

 Does that make sense?

No, I'm still not clear.

If I retain the original URI as the identifier for the galaxy and add
either a redirect or a Content-Location, then I don't see how I break
those linking their data to it as their statements are still made
about the original URI.

But I don't see how I'm breaking people linking to it as if it were an
IR. That group of people are using my resource ambiguously in the
first place. Their links will also still resolve to the same content.

L.


-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs)

2011-10-19 Thread Leigh Dodds
Hi,

I tried it with this URI and got an error:

http://www.bbc.co.uk/programmes/b01102yg#programme

Cheers,

L.

On 17 October 2011 11:41, Yang Squared yang.squ...@gmail.com wrote:
 Following the HTTP-range-14 discussion, we developed a Semantic Web URIs
 Validator named Hyperthing which helps to publish the Linked Data. We
 particularly investigated what happens when we temporary and
 permnent redirect (e.g. 301 and 302 redirections) of a Semantic Web URI (303
 and hash URI).
 http://www.hyperthing.org/
 Hyperthing mainly functions for three purposes:
 1) It determines if the requested URI identifies a Real World Object or a
 Web document;
 2) It checks whether the URIs publishing method follows the W3C hash URIs
 and 303 URI practice;
 3) It can be used to check the validity of the chains of the redirection
 between the Real World Object URIs and Document URIs to prevent the data
 publisher mistakenly redirecting between these two kinds. (e.g. it checks
 against redirection which include 301, 302 and 307)
 For more information please read
  Dereferencing Cool URI for the Semantic Web: What is 200 OK on the Semantic
 Web?
 http://dl.dropbox.com/u/4138729/paper/dereference_iswc2011.pdf
 Any suggestion is welcome.


-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

[Aside: changing the subject line so we can have a clearer discussion]

On 17 October 2011 14:58, Norman Gray nor...@astro.gla.ac.uk wrote:
...
 I've done far fewer talks of this type than Tom has, but I've never found 
 anyone having difficulty here, either.  Mind you, I never talk of 
 'information resource' or httpRange-14.

 For what it's worth, I generally say something along the lines of This URI, 
 X, is the name of a galaxy.  If you put that URI into your
 browser, you can't get the galaxy back, can you, because the galaxy is too 
 big to fit inside your computer.  So something different has to
 happen, doesn't it?  A remark about Last-Modified generally seals the deal.

I've done the same, and people do quite often get it. At least for a
few minutes :) I think my experience echoes Rob's more than Tom's.
I've had more than one Linked Data talk/tutorial de-railed by debate
and discussion of the issue when there are much more interesting
aspects to explore.

While I've not used the galaxy example, I have taken similar
approaches. But I can also imagine saying, for example:

This URI, X, is the name of a galaxy.  If you put that URI into your
browser, obviously you can't get the galaxy back, can you. So when you
request it, you get back a representation of it. You know, just like
when you request a file from a web server you don't download the
*actual* file, just a representation of it. Possibly in another
format.

And further, if someone asked about Last-Modified dates:

Last-Modified? Well as it turns out the Last-Modified date isn't
defined to be the date that a resource last changed. It's up to the
origin server to decide what it means. So for something like a galaxy,
it can be the date of our last observation.

My point being that web architecture already has a good explanation as
to why real-world, or even digital things are passed around the
internet. That's why we have the Resource and Representation
abstractions in the first place.

So, can we turn things on their head a little. Instead of starting out
from a position that we *must* have two different resources, can we
instead highlight to people the *benefits* of having different
identifiers? That makes it more of a best practice discussion and one
based on trade-offs: e.g. this class of software won't be able to
process your data correctly, or you'll be limited in how you can
publish additional data or metadata in the future.

I don't think I've seen anyone approach things from that perspective,
but I can't help but think it'll be more compelling. And it also has
the benefits of not telling people that they're right or wrong, but
just illustrate what trade-offs they are making.

Is this not something we can do on this list? I suspect it'd be more
useful than attempting to categorise, yet again, the problems of hash
vs slash URIs. Although a canonical list of those might be useful to
compile once and for all.

Anyone want to start things off?

As a leading question: does anyone know of any deployed semantic web
software that will reject or incorrectly process data that flagrantly
ignores httprange-14?

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

On 19 October 2011 18:44, Kingsley Idehen kide...@openlinksw.com wrote:
 
 So, can we turn things on their head a little. Instead of starting out
 from a position that we *must* have two different resources, can we
 instead highlight to people the *benefits* of having different
 identifiers?

 But you don't have two different resources. Please correct me if I am
 reading you inaccurately here, but are you saying that:

 http://dbpedia.org/resource/Linked Data and http://dbpedia.org/page/Linked
 Data == two different resources?

 I see:

 1. 2 URIs
 2. a generic URI (serving as a Name) and a purpose specific URI called a URL
 that serves as a data access address -- still two identifiers albeit split
 by function .

RFC3983:

A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource.

2 URIs, therefore 2 resources.

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi Jonathan

On 19 October 2011 18:36, Jonathan Rees j...@creativecommons.org wrote:
 On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds leigh.do...@talis.com wrote:

 So, can we turn things on their head a little. Instead of starting out
 from a position that we *must* have two different resources, can we
 instead highlight to people the *benefits* of having different
 identifiers? That makes it more of a best practice discussion and one
 based on trade-offs: e.g. this class of software won't be able to
 process your data correctly, or you'll be limited in how you can
 publish additional data or metadata in the future.

 I don't think I've seen anyone approach things from that perspective,
 but I can't help but think it'll be more compelling. And it also has
 the benefits of not telling people that they're right or wrong, but
 just illustrate what trade-offs they are making.

 Is this not something we can do on this list? I suspect it'd be more
 useful than attempting to categorise, yet again, the problems of hash
 vs slash URIs. Although a canonical list of those might be useful to
 compile once and for all.

 Anyone want to start things off?

 Sure.  http://www.w3.org/2001/tag/2011/09/referential-use.html

Thanks for the pointer. That's an interesting document. I've read it
once but need to digest it a bit further.

The crux of the issue, and what I was getting at in this thread is
what you refer to towards the end:

It is possible that D2 and S2 can be used side by side by different
communities for quite a while before a collision of the sort described
above becomes a serious interoperability problem. On the other hand,
when the conflict does happen, it will be very painful.

I think what I'm interested in is what problems might surface and
approaches for mitigating them. I'm particularly curious whether
heuristics might be used to disambiguate or remove conflict.

 As a leading question: does anyone know of any deployed semantic web
 software that will reject or incorrectly process data that flagrantly
 ignores httprange-14?

 Tabulator.

Yes. That's the only piece of software I've heard of that has problems.



-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

On 19 October 2011 20:48, Kingsley Idehen kide...@openlinksw.com wrote:
 On 10/19/11 3:16 PM, Leigh Dodds wrote:
 
 But you don't have two different resources. Please correct me if I am
 reading you inaccurately here, but are you saying that:

 http://dbpedia.org/resource/Linked Data and
 http://dbpedia.org/page/Linked
 Data == two different resources?

 I see:

 1. 2 URIs
 2. a generic URI (serving as a Name) and a purpose specific URI called a
 URL
 that serves as a data access address -- still two identifiers albeit
 split
 by function .

 RFC3983:

 A Uniform Resource Identifier (URI) is a compact sequence of
 characters that identifies an abstract or physical resource.

 Yes, I agree with that.

 2 URIs, therefore 2 resources.

 I disagree with your interpretation though.

But I'm not interpreting anything there. The definition is a URI
identifies a resource. Ergo two different URIs identify two resources.

Whether those resources might be related to one another, or even
equivalent is an entirely different matter.

 Identifiers are names / handles. Thus, you have Names that resolve to actual
 data albeit via different levels of indirection.

 http://dbpedia.org/resource/Linked_Data and
 http://dbpedia.org/page/Linked_Data are routes to different representations
 of the same data. /resource/ (handle or name) is an indirect access route
 while /page/ is direct (address i.e., a location name) albeit with
 representation specificity i.e., HTML in the case of DBpedia.

 I am very happy that we've been able to narrow our differing views to
 something very concrete. Ultimately, we are going to arrive at clarity, and
 that's all that matters to me, fundamentally.

*That* all seems to be interpretation to me.

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

On 19 October 2011 23:36, Nathan nat...@webr3.org wrote:
 Leigh Dodds wrote:

 On 19 October 2011 20:48, Kingsley Idehen kide...@openlinksw.com wrote:

 On 10/19/11 3:16 PM, Leigh Dodds wrote:

 RFC3983:

 A Uniform Resource Identifier (URI) is a compact sequence of
 characters that identifies an abstract or physical resource.

 Yes, I agree with that.

 2 URIs, therefore 2 resources.

 I disagree with your interpretation though.

 But I'm not interpreting anything there. The definition is a URI
 identifies a resource. Ergo two different URIs identify two resources.

 Nonsense, and I'm surprised to hear it.

 Given two distinct URIs the most you can determine is that you have two
 distinct URIs.

 You do not know how many resources are identified, there may be no
 resources, one, two, or full sets of resources.

 Do see RFC3986, especially the section on equivalence.


OK, so maybe there is interpretation here :)

My reading is that, without additional knowledge, we should assume
that different URIs identify different resources. I think the wording
of RFC 3986 is fairly clear that a URI identifies a resource, so
assuming multiple resources for multiple URIs is fine - as a starting
position. I do understand that two  URIs can be aliases.

The section on equivalence you refer to suggests ways to identify
equivalence ranging from syntactic comparisons up to network protocol
operations. The latter gives us additional information (status codes,
headers) that can determine equivalence.

To go back to Kingsley's original example, I don't see any equivalence
of those URIs at the syntactic or network level

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Beyond the Triple Count

2011-09-28 Thread Leigh Dodds
Hi,

I did a talk at semtech this week about some ideas for improving how
we document, publish and assess datasets. I've done a write-up which
might be of interest:

http://blog.kasabi.com/2011/09/28/beyond-the-triple-count/

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Question: Authoritative URIs for Geo locations? Multi-lingual labels?

2011-09-09 Thread Leigh Dodds
Hi,

As well as the others already mentioned there's also Yahoo Geoplanet:

http://beta.kasabi.com/dataset/yahoo-geoplanet

This has multi-lingual labels and is cross-linked to the Ordnance
Survey data, Dbpedia, but that could be improved.

As for a list, there are currently 34 geography related datasets
listed in Kasabi here:

http://beta.kasabi.com/browse/datasets/results/og_category%3A147

Cheers,

L.

On 8 September 2011 15:38, M. Scott Marshall mscottmarsh...@gmail.com wrote:
 It seems that dbpedia is a de facto source of URIs for geographical
 place names. I would expect to find a more specialized source. I think
 that I saw one mentioned here in the last few months. Are there
 alternatives that are possible more fine-grained or designed
 specifically for geo data? With multi-lingual labels? Perhaps somebody
 has kept track of the options on a website?

 -Scott

 --
 M. Scott Marshall
 http://staff.science.uva.nl/~marshall

 On Thu, Sep 8, 2011 at 3:07 PM, Sarven Capadisli i...@csarven.ca wrote:
 On Thu, 2011-09-08 at 14:01 +0100, Sarven Capadisli wrote:
 On Thu, 2011-09-08 at 14:07 +0200, Karl Dubost wrote:
  # Using RDFa (not implemented in browsers)
 
 
  ul xmlns:geo=http://www.w3.org/2003/01/geo/wgs84_pos#; id=places-rdfa
      lispan
          about=http://www.dbpedia.org/resource/Montreal;
          geo:lat_long=45.5,-73.67Montréal/span, Canada/li
      lispan
          about=http://www.dbpedia.org/resource/Paris;
          geo:lat_long=48.856578,2.351828Paris/span, France/li
  /ul
 
  * Issue: Latitude and Longitude not separated
    (have to parse them with regex in JS)
  * Issue: xmlns with !doctype html
 
 
  # Question
 
  On RDFa vocabulary, I would really like a solution with geo:lat and 
  geo:long, Ideas?

 Am I overlooking something obvious here? There is lat, long properties
 in wgs84 vocab. So,

 span about=http://dbpedia.org/resource/Montreal;
     span property=geo:lat
           content=45.5
           datatype=xsd:float/span
     span property=geo:lat
           content=-73.67
           datatype=xsd:float/span
     Montreal
 /span

 Tabbed for readability. You might need to get rid of whitespace.

 -Sarven

 Better yet:

 li about=http://dbpedia.org/resource/Montreal;
    span property=geo:lat
 ...


 -Sarven





-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Question: Authoritative URIs for Geo locations? Multi-lingual labels?

2011-09-09 Thread Leigh Dodds
Hi Kingsley,

On 9 September 2011 15:20, Kingsley Idehen kide...@openlinksw.com wrote:
 On 9/9/11 8:58 AM, Leigh Dodds wrote:

 Hi,

 As well as the others already mentioned there's also Yahoo Geoplanet:

 http://beta.kasabi.com/dataset/yahoo-geoplanet

 This has multi-lingual labels and is cross-linked to the Ordnance
 Survey data, Dbpedia, but that could be improved.

 As for a list, there are currently 34 geography related datasets
 listed in Kasabi here:

 http://beta.kasabi.com/browse/datasets/results/og_category%3A147

 Leigh,

 Can anyone access these datasets or must they obtain a kasabi account en
 route to authenticated access?

As I've said (repeatedly!) there's no authentication around any of
Linked Data. That might be an option for publishers in future, but not
during the beta and not for any of the open datasets which we've
published currently.

API keys are only required for the APIs, e.g. SPARQL, search, etc. The
choice of authentication options will increase in future.

So I encourage you to actually go and have a look. There's a direct
link to the Linked Data views from every homepage.

Here's a pointer to the blog post I wrote and circulated after our
last discussion:

http://blog.kasabi.com/2011/08/12/linked-data-in-kasabi/

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

2011-08-24 Thread Leigh Dodds
Hi,

On 23 August 2011 15:17, Gannon Dick gannon_d...@yahoo.com wrote:
 Either Linked Data ecosystem or linked data Ecosystem is a dangerously 
 flawed paradigm, IMHO.  You don't improve MeSH by
 flattening it, for example, it is what it is. Since CAS numbers are not a 
 directed graph, an algorithmic transform to a URI (which *is* a
 directed graph) is risks the creation of a new irreconcilable taxonomy.  
 For example, Nitrogen is ok to breathe and liquid Nitrogen is a
 not very practical way to chill wine.

A URI isn't a directed graph. You can use them to build one by making
statements though.

Setting aside any copyright issues, the CAS identifiers are useful
Natural Keys [1]. As they're well deployed, using them to create URIs
[2] is sensible as it simplifies the process of linking between
datasets [3].

To answer Patrick's question, to help bridging between systems that
only use the original literal version, rather than the URIs, then we
should ensure that the literal keys are included in the data [4].

These are well deployed patterns and, from my experience, make it
really simple and easy to bridge and link between different datasets
and systems.

Cheers,

L.

[1]. http://patterns.dataincubator.org/book/natural-keys.html
[2]. http://patterns.dataincubator.org/book/patterned-uris.html
[3]. http://patterns.dataincubator.org/book/shared-keys.html
[4]. http://patterns.dataincubator.org/book/literal-keys.html

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

2011-08-24 Thread Leigh Dodds
Hi,

On 24 August 2011 15:40, David Wood da...@3roundstones.com wrote:
 On Aug 24, 2011, at 2:44, Leigh Dodds leigh.do...@talis.com wrote:

 Hi,

 On 23 August 2011 15:17, Gannon Dick gannon_d...@yahoo.com wrote:
 Either Linked Data ecosystem or linked data Ecosystem is a dangerously 
 flawed paradigm, IMHO.  You don't improve MeSH by
 flattening it, for example, it is what it is. Since CAS numbers are not a 
 directed graph, an algorithmic transform to a URI (which *is* a
 directed graph) is risks the creation of a new irreconcilable taxonomy.  
 For example, Nitrogen is ok to breathe and liquid Nitrogen is a
 not very practical way to chill wine.

 A URI isn't a directed graph. You can use them to build one by making
 statements though.

 Setting aside any copyright issues, the CAS identifiers are useful
 Natural Keys [1]. As they're well deployed, using them to create URIs
 [2] is sensible

 Hi Leigh,

 Right.  Unfortunately it is also illegal :/

Yes, I read the first part of the thread! I was merely pointing out
the useful patterns for projecting identifiers into URIs.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: New draft of Linked Data Patterns book

2011-08-22 Thread Leigh Dodds
Hi,

On 20 August 2011 16:01, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 Seems pretty interesting, clearly out of practical experience !

Thanks Giovanni! Yes, I've been trying to apply practical experience
wherever possible. I'm very keen on collecting useful application
patterns that may help others build good RDF  Linked Data based apps.

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Job: Data Engineer, Kasabi

2011-08-12 Thread Leigh Dodds
Hi,

Just a reminder to people that this job opening is still available. If
you're interested in doing hands on work with a wide range of
different data types, covering both free, open data  commercial
datasets. Over time we expect to be doing more data analysis using
Map-Reduce and Pregel, as well as interlinking and enrichment.

We're looking for someone who is enthusiastic about working with,
analysing, and demonstrating the value of data. If you want a hands-on
role working with data, then this should definitely be of interest.

More details at [1] or feel free to drop me an email with any
questions or applications.

[1] http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIScws=1rid=41

Cheers,

L.

On 17 June 2011 16:22, Leigh Dodds leigh.do...@talis.com wrote:
 Hi,

 Short job advert: we're looking for someone to join the Kasabi team as
 a Data Engineer. The role will involve working with RDF and Linked
 Data so should be of interest to this community!

 More information at [1]. Feel free to get in touch with me personally
 if you want more information.

 Cheers,

 L.

 [1] 
 http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIScws=1rid=41

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Mobile: 07850 928381
 http://kasabi.com
 http://talis.com

 Talis Systems Ltd
 43 Temple Row
 Birmingham
 B2 5LS




-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: DBpedia: limit of triples

2011-08-09 Thread Leigh Dodds
Hi,

On 9 August 2011 11:26, Jörn Hees j_h...@cs.uni-kl.de wrote:
 ...
 I also guess it would be better to construct the given document first from 
 the outgoing triples, maybe preferring the ontology mapped triples, and then 
 incoming links up to a 2000 triples limit (if necessary to limit bandwidth).
 That would fit the description in the above mentioned section way better than 
 the current implementation.

You could also try a mirror to see if that provides better facilities, e.g. [1]

Cheers,

L.

[1]. http://beta.kasabi.com/dataset/dbpedia-36

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 12 July 2011 18:45, Pablo Mendes pablomen...@gmail.com wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1].
 ...

This email prompted a discussion about how to the data collection or
diagram could be improved or updated. As CKAN is an open platform and
anyone can add additional tags to datasets, why doesn't everyone who
is interested in seeing a particular improvement or alternate view of
the data just go ahead and do it? There's no need to require all this
to be done by one team on a fixed schedule.

Some light co-ordination between people doing similar analyses would
be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on
whether their Linked Data or SPARQL endpoint is available regularly,
whether they're currently maintained, or (my current bug bear) whether
the data dumps they publish parse with more than one tool chain.

It'd be nice to see many different aspects of the cloud being explored.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 13 July 2011 13:05, Bernard Vatant bernard.vat...@mondeca.com wrote:
 Re. availability, just a reminder of SPARQL Endpoints Status service
 http://labs.mondeca.com/sparqlEndpointsStatus/index.html
 As of today 80% (192/240) endpoints registered at CKAN are up and running.
 Monitor grey dots (still alive?) for candidate passed out datasets ...

Well as Kingsley pointed out SPARQL is only one metric. Whether the
URIs still resolve is arguably most important for the Linked Data
diagram, but service availability is a good thing to monitor.

However its also worth noting that there are mirrors of a number of
datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud,
some of which are mirrors. Not all (any?) of those SPARQL endpoints
are on your list.

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 13 July 2011 14:30, Kingsley Idehen kide...@openlinksw.com wrote:
 Can you ping me or reply to this list with a list of missing SPARQL
 endpoints. Alternatively, you bookmark them on del.icio.us using tag:
 sparql_endpoint.

 Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint .

The data is all in a machine-readable form. See:

http://data.kasabi.com/datasets

The URI supports conneg so you can follow rdfs:seeAlso links to all of
the VoiD descriptions and hence to the sparql endpoints, plus all of
the other APIs.

It'd be nice if the LD cloud diagram used other machine-readable
sources where possible. I know CKAN is a good focal point for helping
curate activity, but also frustrating to have to copy data around
whether manually or otherwise.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: WebID vs. JSON (Was: Re: Think before you write Semantic Web crawlers)

2011-06-22 Thread Leigh Dodds
Hi,

On 22 June 2011 15:41, William Waites w...@styx.org wrote:
 What does WebID have to do with JSON? They're somehow representative
 of two competing trends.

 The RDF/JSON, JSON-LD, etc. work is supposed to be about making it
 easier to work with RDF for your average programmer, to remove the
 need for complex parsers, etc. and generally to lower the barriers.

 The WebID arrangement is about raising barriers. Not intended to be
 the same kind of barriers, certainly the intent isn't to make
 programmer's lives more difficult, rather to provide a good way to do
 distributed authentication without falling into the traps of PKI and
 such.

 While I like WebID, and I think it is very elegant, the fact is that I
 can use just about any HTTP client to retrieve a document whereas to
 get rdf processing clients, agents, whatever, to do it will require
 quite a lot of work [1]. This is one reason why, for example, 4store's
 arrangement of /sparql/ for read operations and /data/ and /update/
 for write operations is *so* much easier to work with than Virtuoso's
 OAuth and WebID arrangement - I can just restrict access using all of
 the normal tools like apache, nginx, squid, etc..

 So in the end we have some work being done to address the perception
 that RDF is difficult to work with and on the other hand a suggestion
 of widespread putting in place of authentication infrastructure which,
 whilst obviously filling a need, stands to make working with the data
 behind it more difficult.

 How do we balance these two tendencies?

By recognising that often we just need to use existing technologies
more effectively and more widely, rather than throw more technology at
a problem, thereby creating an even greater education and adoption
problem?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Squaring the HTTP-range-14 circle

2011-06-17 Thread Leigh Dodds
Hi,

On 17 June 2011 14:04, Tim Berners-Lee ti...@w3.org wrote:

 On 2011-06 -17, at 08:51, Ian Davis wrote:
 ...

 Quite. When a facebook user clicks the Like button on an IMDB page
 they are expressing an opinion about the movie, not the page.

 BUT when the click a Like button on a blog they are expressing they like the
 blog, not the movie it is about.

 AND when they click like on a facebook comment they are
 saying they like the comment not the thing it is commenting on.

 And on Amazon people say I found this review useful to
 like the review on the product being reviewed, separately from
 rating the product.
 So there is a lot of use out there which involves people expressing
 stuff in general about the message not its subject.

Well even that's debatable.

I just had to go and check whether Amazon reviews and Facebook
comments actually do have their own pages. That's because I've never
seen them presented as anything other than objects within another
container, either in a web page or a mobile app. So I think you could
argue that when people are linking and marking things as useful,
they're doing that on a more general abstraction, i.e. the Work (to
borrow FRBR terminology) not the particular web page.

And that's presumably the way that Facebook and Amazon see it too
because that data is associated with the status or review in whichever
medium I look at it (page or app).

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Squaring the HTTP-range-14 circle

2011-06-17 Thread Leigh Dodds
Hi,

On 17 June 2011 15:32, Kingsley Idehen kide...@openlinksw.com wrote:
 On 6/17/11 3:11 PM, Leigh Dodds wrote:

 I just had to go and check whether Amazon reviews and Facebook
 comments actually do have their own pages. That's because I've never
 seen them presented as anything other than objects within another
 container, either in a web page or a mobile app. So I think you could
 argue that when people are linking and marking things as useful,
 they're doing that on a more general abstraction, i.e. the Work (to
 borrow FRBR terminology) not the particular web page.

 You have to apply context to your statement above. Is the context: WWW as an
 Information space or Data Space?

I can't answer that because I don't know what you mean by those terms.
It's just a web of resources as far as I'm concerned.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: For our UK readers

2011-05-25 Thread Leigh Dodds
Lets hope that any fall-out doesn't come back to me as the person to
whom errors are reported to!

Arguably the generatorAgent and errorReportsTo predicate ought to be
removed if you're done further hand editing/changes to the file, but I
doubt anyone does that in practice.

Cheers,

L.

On 24 May 2011 15:07, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 http://who.isthat.org/id/CTB

 Have I got the RDF right?
 Not sure foaf is the right thing for this.
 Should there be a blank node somewhere in there?
 Suggestions for improvements welcome.

 Hugh





-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: implied datasets

2011-05-23 Thread Leigh Dodds
Hi William,

On 23 May 2011 14:01, William Waites w...@styx.org wrote:
 ...
 Then for each dataset that I have that uses the links to this space, I
 count them up and make a linkset pointing at this imaginary dataset.

 Obviously the same strategy for anywhere there exist some kind of
 standard identifiers that are not URIs in HTTP.

 Does this make sense?

I'm not sure that the dataset is imaginary, but what you're doing
seems eminently sensible to me. I've been working on a little project
that I hope to release shortly that aims to facilitate this kind of
linking, especially where those non-URI identifiers, or Literal Keys
[1] are
used to build patterned URIs.

 Can we sensibly talk about and even assert the existence of a dataset
 of infinite size? (whatever existence means).

I think so, we can assert what kinds of things it contains and
describe it in general terms, even if we can't enumerate all of its
elements.

It may be more natural to thing of these more as services though than
datasets. i.e. a service that accepts some keys as input and returns a
set of assertions. In this case the assertions would be links to other
datasets.

 Is this an abuse of DCat/voiD?

Not in my view, I think the notion of dataset is already pretty broad.

 Are this class of datasets subsets of sameAs.org (assuming sameAs.org
 to be complete in principle?)

Subsets if they only asserted sameAs links, but I think you're
suggesting that this may be too strict. I think there's potentially a
whole set of related predicate based services [2] that provide
useful indexes of existing datasets, or expose additional annotations
of extra sources.

The project I've been working on facilitates not just sameAs links,
but any form of links that can be derived from shared URI patterns.
This would include topic/subject based linking. ISBN was one the use
cases I had in mind, but here are others.

Cheers,

L.

[1]. http://patterns.dataincubator.org/book/literal-keys.html
[2]. http://www.ldodds.com/blog/2010/03/predicate-based-services/

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Why does rdf-sparql-protocol say to return 500 when refusing a query?

2011-04-28 Thread Leigh Dodds
Hi,

On 27 April 2011 11:18, Alexander Dutton alexander.dut...@oucs.ox.ac.uk wrote:
 On 17/04/11 21:07, Hugh Glaser wrote:

 As a consumer I would like to be able to distinguish a refusal to answer
 from a failure of the web server to access the store, for example.

 In the general case, that was my concern, too. AFAICT from the spec, you
 aren't precluded from returning e.g. 504 if the store has disappeared.

 I've always (perhaps wrongly) equated a 500 with the web server encountering
 some exceptional and *unexpected* condition¹; specifically, an uncaught
 exception in the web application. As such I've always taken a 500 to be
 indicative of a bug which should be fixed to fail more gracefully, perhaps
 with a more appropriate code from the 4xx/5xx range².

 As a web developer I always try to 'fix' situations where my code returns a
 500. As a consumer I will take a 500 to be an application error and attempt
 to inform the webmaster of the inferred 'bug'.

 I can think of the following situations where a SPARQL endpoint might not
 return a result:

 * Syntax error (400)
 * Accept range mismatch (406)
 * Query rejected off-hand as too resource-intensive (403?)
 * Store unreachable (504?)
 * Server overloaded (503?)
 * Query timed out (504?, 403?)

+1 to using the full range of HTTP status codes.

Personally I don't really see it as see it as revisionist or
retro-fitting to use HTTP status codes to indicate these application
level semantics. There's a good range of status codes available and
they're reasonably well defined for these broad scenarios, IMO.
Especially so when you use additional headers, e.g. Retry-After (as
David Booth noted) to communicate additional information at the
protocol level.

This is mainly about good web application engineering that anything to
do with SPARQL protocol per se.

However it may be useful to define a standard response format and
potentially error messages to help client apps/users distinguish
between more fine-grained error states. I suggested this during
discussion of the original protocol specification but the WG decided
it wasn't warranted initially [1]. Based on this discussion I'm not
sure implementation experience has moved on enough, or converged
enough to feed this back as part of SPARQL 1.1.

Doesn't stop the community agreeing on some conventions/best practices though.

Cheers,

L.

[1]. 
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jan/0106.html

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Navigating Data (was Re: Take2: 15 Ways to Think About Data Quality (Just for a Start) )

2011-04-28 Thread Leigh Dodds
Hi,

Changed subject line to match topic:

On 15 April 2011 14:47, glenn mcdonald gl...@furia.com wrote:
 This reminds me to come back to the point about what I initially
 called Directionality, and Dave improved to Modeling Consistency.

 ...
 - But even in RDF, directionality poses a significant discovery
 problem. In a minimal graph (let's say minimal graph means that each
 relationship is asserted in only one direction, so there's no
 relationship redundancy), you can't actually explore the data
 navigationally. You can't go to a single known point of interest, like
 a given president, and explore to find out everything the data holds
 and how it connects...

Doesn't this really depend on how the navigational interface is constructed?

If we're looking purely at Linked Data views created using a Concise
Bounded Description, then yes I agree, if there are no back links in
the data, then navigation is problematic.

But if we use different algorithms to describe the views, or
supplement it with SPARQL queries, then those navigational links can
be presented, e.g. other resources that refer to this resources.

I think as you noted elsewhere inverse links could also be inferred
based on the schema. This simplifies the navigation UI as the links
are part of the data.

 ...You can explore the *outward* relationships from
 any given point, but to find out about the *inward* relationships you
 have to keep doing new queries over the entire dataset.

Yes.

 ...The same basic
 issue applies to an XML representation of the data as a tree: you can
 squirrel your way down, but only in the direction the original modeler
 decided was down. If you need a different direction, you have to
 hire a hypersquirrel.

Well an XML node typically has a reference to its parent (it does in
the DOM anyway) so moving back up the tree is easy.

 - Of course, most RDF-presenting systems recognize this as a usability
 problem, and address it by turning the minimal graph into a redundant
 graph for UI purposes. Thus in a data-browser UI you usually see, for
 a given node, lists of both outward and inward relationships. This is
 better, but if this abstraction is done at the UI layer, you still
 lose it once you drop down into the SPARQL realm. This makes the
 SPARQL queries harder to write, because you can't write them the way
 you logically think about the question, you have to write them the way
 the data thinks about the question. And this skew from real logic to
 directional logic can make them *much* harder to understand or
 maintain, because the directionality obscures the purpose and reduces
 the self-documenting nature of the query.

Assuming you don't materialize the inferences directly in the data,
then isn't the answer to have both the SPARQL endpoint and the
navigational UI use the same set of inferred data?

 All of this is *much* better, in usability terms, if the data is
 redundantly, bi-directionally connected all the way down to the level
 of abstraction at which you're working. Now you can explore to figure
 out what's there, and you can write your queries in the way that makes
 the most human sense. The artificicial skew between the logical
 structure and the representational structure has been removed. This is
 perfectly possible in an RDF-based system, of course, if the software
 either generates or infers the missing inverses. We incur extra
 machine overhead to reduce the human congnitive burden. I contend this
 should be considered a nearly-mandatory best-practice for linked data,
 and that propogating inverses around the LOD cloud ought to be one of
 things that makes the LOD cloud *a thing*, rather than just a
 collection of logical silos.

The same problem exists on the document web: it can be useful to know
what links to a specific page. There are various techniques to help
address that, e.g. centralized indexes that can expose more of the
graph (Google) or point-to-point mechanisms for notifying links (e.g.
Pingback, etc).

With RDF system we may be able to infer some extra links, buth with
Linked Data we can't infer all of them, so we have the same issue and
can deploy very similar infrastructure to solve the problem.

Currently we have SameAs.org, which is specialized for one type of
linking, but it'd be nice to see others [1]. And there have been
experiments with various pingback/notification services for Linked
Data. Are any of the latter being widely deployed/used?

Cheers,

L.

[1]. http://www.ldodds.com/blog/2010/03/predicate-based-services/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Minting URIs: how to deal with unknown data structures

2011-04-18 Thread Leigh Dodds
Hi,

On 15 April 2011 13:48, Frans Knibbe frans.kni...@geodan.nl wrote:
 I have acquired the first part (authority) of my URIs, let's say it is
 lod.mycompany.com. Now I am faced with the question: How do I come up with a
 URI scheme that will stand the test of time?

You might be interested in the Identifier Patterns documented here:

http://patterns.dataincubator.org/book/identifier-patterns.html

There's also the Designing URI Sets for the Public Sector document,
which provides the guidance for creating URIs for UK government data:

http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Possible Idea For a Sem Web Based Game?

2010-11-22 Thread Leigh Dodds
Hi,

On 20 November 2010 17:28, Melvin Carvalho melvincarva...@gmail.com wrote:
 I was thinking about creating a simple game based on semantic web
 technologies and linked data.

 Some on this list may be too young to remember this, but there used to
 be game books where you would choose your own adventure.

 http://en.wikipedia.org/wiki/Choose_Your_Own_Adventure

Yes, I've thought this would make a really nice showcase too.

Liam Quinn built a nice little demo [1] of something like this. I was
also looking at the Inform interactive fiction engine [1] (again!)
recently. The basic engine is basically a set of core rules about a
game world operates. The core rules can be extended and ability for
user to interact with the world can be inferred from those rules. E.g.
whether you can climb onto or inside something. Struck me that it'd be
possible to (re-)build a lot of that using RDF, OWL, RIF.

Cheers,

L.

[1]. http://dirk.holoweb.net/~liam/rdfg/rdfg.cgi
[2]. http://www.inform-fiction.org/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Google Refine 2.0

2010-11-12 Thread Leigh Dodds
Hi David,

Congratulations on getting the 2.0 release out. I'm looking forward to
working with it some more.

Kingsley asked about extensions. You've already mentioned the work
done at DERI, and I've previously pointed at the reconciliation API I
built over the Talis Platform [1].

I used Refines' excellent plugin architecture to create a simple
upload tool for loading Talis Platform stores. This hooks into both
core Gridworks and the DERI RDF extension to support POSTing of the
RDF to a service. Code is just a proof of concept [2] but I have a
more refined version that I parked briefly whilst awaiting the 2.0
release.

I think this nicely demonstrates how open Refine is as tool.

Cheers,

L.

[1]. 
http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/
[2]. https://github.com/ldodds/gridworks-talisplatform

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Google Refine 2.0

2010-11-12 Thread Leigh Dodds
Hi Kingsley:

I recommend you take some time to work with Refine, watch the demos,
and perhaps read the paper that Richard et al published on how they
have used and extended Refine (or Gridworks as it was)

But to answer you question:

On 12 November 2010 13:23, Kingsley Idehen kide...@openlinksw.com wrote:
 How does the DERI effort differ from yours, if at all?

They have produced a plugin that complements the ability to map a
table structure to a Freebase schema and graph, by providing the same
functionality for RDF. So a simple way to define how RDF should be
generated from data in a Refine project, using either existing or
custom schemas.

The end result can then be exported using various serialisations.

My extension simply extends that further by providing the ability to
POST the data to a Talis Platform store. It'd be trivial to tweak that
code to support POSTing to another resource, or wrapping the data into
a SPARUL insert

Ideally it'd be nice to roll the core of this into the DERI extension
for wider use.

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 08:51, Dave Reynolds dave.e.reyno...@gmail.com wrote:
 On Thu, 2010-11-04 at 20:58 -0400, Kingsley Idehen wrote:

 When you create hypermedia based structured data for deployment on an
 HTTP network (intranet, extranet, World Wide Web) do include a
 relation that associates each Subject/Entity (or Data Item) with its
 container/host document. A suitable predicate for this is:
 wdrs:describedBy [2] .

 Ian mentioned this predicate in his post.

 Looking at [1] the range of wdrs:describeBy is given as class of POWDER
 documents and is a sub class of owl:Ontology which seems to make it
 unsuitable as a general predicate for the purpose being discussed here.

Yes, I was going to point out the same thing and suggest that the FOAF
topic terms are a better fit (and already in use in a number of
places).

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-05 Thread Leigh Dodds
Hi,

On 4 November 2010 18:42, Kingsley Idehen kide...@openlinksw.com wrote:
 Nobody ever mandated 303 redirection.

I've never encountered anyone in the community that has recently
advocated (i.e. since the httpRange-14 discussion) or any
documentation that promotes anything other than using # URIs or the
303 redirect approach.

So in the circumstance where someone doesn't want to use a # URI, what
options are available? Can point to a document that illustrates an
alternate approach? It's not really choice if there's only one option.

You also frequently cite dbpedia as the de facto standard model for
publishing Linked Data. This uses the 303 pattern, giving further
prominence to that approach.

 It has always been an option, and so it should remain.

But if there are other options that have a better mix of advantages
and dis-advantages to the two that the community has promoted thus
far, then we should include those too.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-05 Thread Leigh Dodds
Hi Nathan,

On 4 November 2010 18:08, Nathan nat...@webr3.org wrote:
 You see it's not about what we say, it's about what other say, and if 10
  huge corps analyse the web and spit out billions of triples saying
 that anything 200 OK'd is a document, then at the end when we consider
 the RDF graph of triples, all we're going to see is one statement saying
 something is a nonInformationResource and a hundred others saying it's
 a document and describing what it's about together with it's format and
 so on.

Are you suggesting that Linked Data crawlers could/should look at the
status code and use that to infer new statements about the resources
returned? If so, I think that's the first time I've seen that
mentioned, and am curious as to why someone would do it. Surely all of
the useful information is in the data itself.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-05 Thread Leigh Dodds
Hi David,

On 4 November 2010 19:57, David Wood da...@3roundstones.com wrote:
 Some small number of people and organizations need to provide back-links on 
 the Web since the Web doesn't have them.
 303s provide a generic mechanism for that to occur.  URL curation is a useful 
 and proper activity on the Web, again in my opinion.

I agree that URL curation is a useful and proper activity on the Web.
I'm not clear on your core concern though. It looks like you're
asserting that HTTP 303 status codes, in general, are useful and
should not be deprecated. Totally agree there. But Ian's proposal is
about using 303 as a necessary part of publishing Linked Data. That
seems distinct from how services like PURLs and DOIs operate, and from
the value they provide. But perhaps I'm misunderstanding?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-05 Thread Leigh Dodds
Hi,

On 4 November 2010 17:51, Nathan nat...@webr3.org wrote:
 But, for whatever reasons, we've made our choices, each has pro's and
 cons, and we have to live with them - different things have different
 name, and the giant global graph is usable. Please, keep it that way.

I think it's useful to continually assess the state of the art to see
whether we're on track. My experience, which seems to be confirmed by
comments from other people on this thread, is that we're seeing push
back from the wider web community -- who have already published way
more data that we have -- on the technical approach we've been
advocating, so looking for a middle ground seems useful.

Different things do have different names, but conflating IR/NIR is not
part of Ian's proposal which addresses the publishing mechanism only.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



What would break, a question for implementors? (was Re: Is 303 really necessary?)

2010-11-05 Thread Leigh Dodds
Hi Michael,

On 5 November 2010 09:29, Michael Hausenblas
michael.hausenb...@deri.org wrote:
 It occurs to me that one of the main features of the Linked Data community
 is that we *do* things rather than having endless conversations what would
 be the best for the world out there. Heck, this is how the whole thing
 started. A couple of people defining a set of good practices and providing
 data following these practices and tools for it.

 Concluding. If you are serious about this, please go ahead. You have a very
 popular and powerful platform at your hand. Implement it there (and in your
 libraries, such as Moriarty), document it, and others may/will follow.

Yes, actually doing things does help more than talking. I sometimes
wonder whether as a community we're doing all the right things, but
that's another discussion ;)

Your suggestion about forging ahead is a good one, but it also reminds
me of Ian's original question: what would break if we used this
pattern?

So here's a couple of questions for those of you on the list who have
implemented Linked Data tools, applications, services, etc:

* Do you rely on or require HTTP 303 redirects in your application? Or
does your app just follow the redirect?
* Would your application tool/service/etc break or generic inaccurate
data if Ian's pattern was used to publish Linked Data.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Inferring data from network interactions (was Re: Is 303 really necessary?)

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 09:54, William Waites w...@styx.org wrote:
 On Fri, Nov 05, 2010 at 09:34:43AM +, Leigh Dodds wrote:

 Are you suggesting that Linked Data crawlers could/should look at the
 status code and use that to infer new statements about the resources
 returned? If so, I think that's the first time I've seen that
 mentioned, and am curious as to why someone would do it. Surely all of
 the useful information is in the data itself.

 Provenance and debugging. It would be quite possible to
 record the fact that this set of triples, G, were obtained
 by dereferencing this uri N, at a certain time, from a
 certain place, with a request that looked like this and a
 response that had these headers and response code. The
 class of information that is kept for [0]. If N appeared
 in G, that could lead directly to inferences involving the
 provenance information. If later reasoning is concerned at
 all with the trustworthiness or up-to-dateness of the
 data it could look at this as well.

Yes, I've done something similar to that in the past when I added
support for the ScutterVocab [1] to my crawler

It was the suggestion that inferring information directly from 200/303
that I was most curious about. I've argued for inferring data from 301
in the past [2], but wasn't sure of merit of introducing data based on
the other interactions

 Keeping this quantity of information around might quickly
 turn out to be too data-intensive to be practical, but
 that's more of an engineering question. I think it does
 make some sense to do this in principle at least.

That's what I found when crawling the BBC pages. Huge amounts of data
and overhead in storing it. Capturing just enough to gather statistics
on the crawl was sufficient.

Cheers,

L.

[1]. http://wiki.foaf-project.org/w/ScutterVocab
[2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: What would break, a question for implementors? (was Re: Is 303 really necessary?)

2010-11-05 Thread Leigh Dodds
Hi Robert,

Thanks for the response, good to hear from an implementor.

On 5 November 2010 10:41, Robert Fuller robert.ful...@deri.org wrote:
 ...
 However... with regard to publishing ontologies, we could expect
 additional overhead if same content is delivered on retrieving different
 Resources for example http://example.com/schema/latitude and
 http://example.com/schema/longitude . In such a case ETag could be used
 to suggest the contents are identical, but not sure that is a practical
 solution. I expect that without 303 it will be more difficult in
 particular to publish and process ontologies.

This is useful to know thanks. I don't think the ETag approach works
as it's intended to version a specific resource, not be carried across
resources.

One way to avoid the overhead is to strongly recommend # URIs for
vocabularies. This seems to be increasingly the norm. It also makes
them easier to work with (you often want the whole document)

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-05 Thread Leigh Dodds
Hi Dave

On 5 November 2010 12:35, Dave Reynolds dave.e.reyno...@gmail.com wrote:
 Yes but I don't think the proposal was to ban use of 303 but to add an
 alternative solution, a third way :)

 I have some sympathy with this. The situation I've faced several times
 of late is roughly this:

 ...
[snip]

Really nice summary Dave.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 12:43, Nathan nat...@webr3.org wrote:
 Dave Reynolds wrote:

 Clearly simply using # URIs solves this but people can be surprisingly
 reluctant to go that route.

 Why? I still don't understand the reluctance, any info on the technical
 non-made-up-pedantic reasons would be great.

Dave provided a pointer to TimBL's discussion which had some comments,
there's also some brief discussion of the technical issues in the Cool
URIs paper, see [1]

[1]. http://www.w3.org/TR/cooluris/#choosing

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary - demo

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 13:57, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 I might be wrong but I dont like it much . Sindice would index it as 2
 documents.

 http://iandavis.com/2010/303/toucan
 http://iandavis.com/2010/303/toucan.rdf

Even though one returns a Content-Location?

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary - demo

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 12:37, Nathan nat...@webr3.org wrote:
 Wrong question, correct question is if I 200 OK will people think this
 is a document, to which the answer is yes. You're toucan is a :Document.

You keep reiterating this, but I'm still not clear on what you're saying.

1. It seems like you're saying that a status code licenses someone to
infer an rdf:type for a resource (in what vocab I'm not sure, but it
looks like you're saying that). Someone is obviously entitled to do
that. Not sure I can think of a use case, do you have one?

2. It also seems like you're suggesting someone is actually doing
that. Or maybe that it's you're expecting someone will start doing it?

3. It also seems like you're suggesting that if someone does do that,
then it breaks the (semantic) web for the rest of us. Which it won't,
unless you blithely trust all data everywhere or don't care to check
your facts

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: RDB to RDF ontology terms reuse

2010-11-05 Thread Leigh Dodds
Hi Christian,

On Friday, November 5, 2010, Christian Rivas chris.rivas@gmail.com wrote:

 foaf:firstName = Domain: foaf:Person Range: Literal
 foaf:familyName = Domain: foaf:Person Range: Literal
 foaf:phone = Domain: NONE Range = NONE
 vcard:email = Domain: vcard:VCard Range = NONE

Personally I would use all foaf terms, foaf:mbox can be used to
capture an email as a mailto: URI.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-04 Thread Leigh Dodds
Hi,

On 4 November 2010 15:21, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 ..but a number of social community mechanisms will activate if you
 bring this up, ranging from russian style you're being antipatriotic
 criticizing the existing status quo  to ..but its so deployed now
 and .. you're distracting the community from other more important
 issues , none of this will make sense if analized by proper logical
 means of course (e.g. by a proper IT manager in a proper company, paid
 based on actual results).

It'd be pretty unfortunate for us as a community if we couldn't
continually test our assumptions about technology and best practices,
especially in light of experience taking these technologies to a wider
audience. I don't think we're at that stage yet though :)

 But the core of the matter really is : who cares.

If that's true then simpler approaches to publishing data should be
not just embraced, but actively promoted and encouraged.

It's great to see spread of RDFa, but I don't see that as a complete answer.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-04 Thread Leigh Dodds
Hi,

On 4 November 2010 13:22, Ian Davis m...@iandavis.com wrote:
 http://iand.posterous.com/is-303-really-necessary

I was minded to look back at the Cool URIs for the Semantic Web note,
which defines two criteria for naming real-world objects with URIs
[1]:

1. Be on the Web.
2. Be unambiguous. -- There should be no confusion between identifiers
for Web documents and identifiers for other resources...

I think this proposal still achieves those aims.

Cheers,

L.

[1]. http://www.w3.org/TR/cooluris/#semweb

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Concordance, Reconciliation, and shared identifiers

2010-10-23 Thread Leigh Dodds
Hi,

On Friday, October 22, 2010, Kingsley Idehen kide...@openlinksw.com wrote:
 On 10/22/10 11:47 AM, Leigh Dodds wrote:

 A great project would be for someone to produce a Linked Data wrapper
 for the Guardian API, that allows linking *in* to their data, based on
 ISBNs and MusicBrainz ids. Its on my TODO list, but then so is a lot
 of other stuff ;)

 We've had sponger meta cartridges [1] for the Guardian API since its
 early incarnations.

Do you have an actual example of that? I had a look at the docs and I
couldn't see how/where the Guardian API data was being surfaced. The
meta cartridge seems to pull a small amount of info from the Guardian
website, rather than the OpenPlatform.

 Again, it would be interesting to build bridges between different
 communities by showing how one can achieve the same effects with
 Linked Data, as well as integrating Linked Data into those services by
 providing gateways services, e.g. implementing the same API but backed
 by RDF. This is what I did for the Gridworks, but the same could be
 extended to other services.

 On our part, we've been doing so since Linked Data inception, and will
 continue to do so

Well the more effort the better. Has anyone else explored the
boundaries between the Linked Data cloud and other APIs and services?

What do people think of more tailored lookup and access services onto
Linked Data over and above simple follow your nose and SPARQL?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)

2010-10-23 Thread Leigh Dodds
Hi Antoine,

On Friday, October 22, 2010, Antoine Zimmermann
antoine.zimmerm...@insa-lyon.fr wrote:
 Le 22/10/2010 17:23, Leigh Dodds a écrit :
 This also strikes me as an opportunity: someone could usefully build a
 service (perhaps built on facilities in Sindice) that aggregated
 schema information and provides tools for expressing simple mappings
 and equivalencies. It could fill a dual role: recommend more
 common/preferred terms, whilst simultaneously providing
 machine-readable equivalencies.

 This sounds very much like what an ontology alignment server is doing:
 it provides alignments [often synonym with mappings] on demand (given
 two ontology URIs), either by retrieving locally stored alignments, or
 by asking another alignment server for an alignment that it may have, or
 by computing the alignment on the fly, given a certain direct matching
 algorithm or from the aggregation (e.g., composition) of existing
 alignments. The alignment server can also be used for various other
 things such as comparing alignments, evaluating them, rating them,
 updating them, etc.

 A paper describing the Alignment server [1] has been submitted to the
 Semantic Web Journal and is under open review (you can read the paper
 and the reviews and submit your own reviews or comments). The server
 itself can be downloaded and installed anywhere [2].

Interesting, thanks for the reference. I was aware that there's has
been and continues to be a lot of research in this area, but was just
wondering out loud whether anyone has explored opening up some kind of
matching service on a more production footing, either as an automated
service or using crowd-sourced mappings.

Running tools locally, and explore their effectiveness would he an
interesting exercise. But presumably  there will be a need to start
surfacing some services in this area soon, as part of the general
semweb infrastructure.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)

2010-10-22 Thread Leigh Dodds
Hi,

On 22 October 2010 15:47, Juan Sequeda juanfeder...@gmail.com wrote:

 Martin and all,
 Can somebody point me to papers or maybe give their definition of low quality 
 data when it comes to LOD. What is the criteria
 for data to be considered low quality.

I asked this in the context of Linked Data on semantic overflow:

http://www.semanticoverflow.com/questions/1072/quality-indicators-for-linked-data-datasets

Some good discussion and pointers in there.

Cheers,

L.

--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Types of Data Source on the LOD Cloud

2010-10-22 Thread Leigh Dodds
Hi,

The LOD cloud analysis [1] is a really great piece of work. I wanted
to pick up on one aspect of the analysis for further discussion:
whether data is published by the data owner or a third-party.

It seems to me that there are broadly three categories into which a
dataset might fall:

* Primary -- published and maintained directly by the data owner, e.g. BBC
* Secondary -- published and maintained by a third-party, e.g. by
scraping, wrapping or otherwise converting a data source
* Tertiary -- published and maintained by a third-party, usually a
mirror or aggregation of primary/secondary sources. This might be a
direct mirror, or involve some additional creativity, e.g.
re-modelling some aspects of another dataset. Mirrors typically
provide additional services, e.g. a SPARQL endpoint where primary
source doesn't provide one.

If we consider the different categories we can see that:

* Growth of the web of data is best served by encouraging more Primary
sources. The current community can't scale to add more Secondary
sources, so adoption is best driven by data owners

* Sustainability and usage of Linked Data is best served by
encouraging more Tertiary sources. Availability of useful, current
aggregations of data, wrapped in services will help drive more
consumption.

What do others think?

Cheers,

L.

[1]. http://www4.wiwiss.fu-berlin.de/lodcloud/state/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)

2010-10-22 Thread Leigh Dodds
Hi,

On 22 October 2010 09:35, Chris Bizer ch...@bizer.de wrote:
 Anja has pointed to a wealth of openly
 available numbers (no pun intended), that have not been discussed at all.
 For
 example, only 7.5% of the data source provide a mapping of proprietary
 vocabulary terms to other vocabulary terms. For anyone building
 applications to work with LOD, this is a real problem.

 Yes, this is also the figure that scared me most.

This might be low for a good reason: people may be creating
proprietary terms because they don't feel well served by existing
vocabularies and hence defining mappings (or even just reusing terms)
may be difficult or even impossible.

This also strikes me as an opportunity: someone could usefully build a
service (perhaps built on facilities in Sindice) that aggregated
schema information and provides tools for expressing simple mappings
and equivalencies. It could fill a dual role: recommend more
common/preferred terms, whilst simultaneously providing
machine-readable equivalencies.

I know that Uberblic provides some mapping tools in this area,
allowing for the creation of a more normalized view across the web,
but not sure how much of that is resurfaced.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Concordance, Reconciliation, and shared identifiers

2010-10-22 Thread Leigh Dodds
Hi,

The announcement of that the Guardian has begun cataloguing other
identifiers (e.g. ISBN, Musicbrainz) within its API [1] is a nice
illustration that the importance of cross-linking between datasets is
starting to become more generally accepted. Setting aside the debate
about what constitutes linked data, I think its important that this
community tracks these various initiatives to help explore the
trade-offs between different approaches, as well as to build bridges
with the wider developer community.

A great project would be for someone to produce a Linked Data wrapper
for the Guardian API, that allows linking *in* to their data, based on
ISBNs and MusicBrainz ids. Its on my TODO list, but then so is a lot
of other stuff ;)

If we look back a few months we can see signs of the importance of
cross-linking appearing in other projects. Google Refine (nee Freebase
Gridworks) has the notion of a reconcilication service that is used
to build and set links [2]. Yahoo meanwhile have their concordance
service [3, 4] which is basically a sameAs.org service for building
cross-links between geo data.

Again, it would be interesting to build bridges between different
communities by showing how one can achieve the same effects with
Linked Data, as well as integrating Linked Data into those services by
providing gateways services, e.g. implementing the same API but backed
by RDF. This is what I did for the Gridworks, but the same could be
extended to other services.

Cheers,

L.

[1]. http://www.guardian.co.uk/open-platform/blog/linked-data-open-platform
[2]. 
http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/
[3]. 
http://blog.programmableweb.com/2010/04/05/yahoos-new-geo-concordance-a-geographic-rosetta-stone/
[4]. 
http://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordance

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Reification alternative

2010-10-14 Thread Leigh Dodds
Hi Mirko,

On 13 October 2010 14:02, Mirko
idonthaveenoughinformat...@googlemail.com wrote:
 Hi all,
 I try to understand alternatives to reification for Linked Data publishing,
 since reification is discouraged. For example, how could I express the
 following without reification:
 @prefix dc: http://purl.org/dc/elements/1.1/.
 @prefix foaf: http://xmlns.com/foaf/0.1/.
 http://ex.org/stmt
   rdfs:label Statement that describes user interest in a document@de;
   rdf:subject http://ex.org/User;
   rdf:predicate foaf:interest;
   rdf:object http://ex.org/Item;
   dc:publisher http://ex.org/Service;
   dc:created 2010-10-13^^xsd:date;
   dc:license http://ex.org/License.
 http://ex.org/User rdf:type foaf:Agent.
 http://ex.org/Item rdf:type foaf:Document.

Why not just model this as an N-Ary relationship [1, 2]? This allows
you to explicitly model the event of someone expressing interest in a
document. You can still infer and publish more direct relationships
(e.g. foaf:interest) from the more structured version [3]. Formally
publishing that derivation using rules or OWL would also be useful.

I'm not convinced that you need to jump immediately to Named Graphs,
Quads, or specific software features here. Especially as they may not
make it easier to either publish or consume the data.

If its useful data, then just model it!

Cheers,

L.

[1]. http://www.w3.org/TR/swbp-n-aryRelations/
[2]. http://patterns.dataincubator.org/book/nary-relation.html
[3]. http://patterns.dataincubator.org/book/ch04s07.html

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



  1   2   >