Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 19 October 2011 23:10, Jonathan Rees j...@creativecommons.org wrote: On Wed, Oct 19, 2011 at 5:29 PM, Leigh Dodds leigh.do...@talis.com wrote: Hi Jonathan I think what I'm interested in is what problems might surface and approaches for mitigating them. I'm sorry, the writeup was designed to do exactly that. In the example in the conflict section, a miscommunication (unsurfaced disagreement) leads to copyright infringement. Isn't that a problem? Yes it is, and these are the issues I think that are worth teasing out. I'm afraid though that I'll have to admit to not understanding your specific example. There's no doubt some subtlety that I'm missing (and a rotten head cold isn't helping). Can you humour me and expand a little? The bit I'm struggling with is: [[[ http://example/x xhv:license http://creativecommons.org/licenses/by/3.0/. According to D2, this says that document X is licensed. According to S2, this says that document Y is licensed ]]] Taking the RDF data at face value, I don't see how the D2 and S2 interpretations differ. Both say that http://example/x has a specific license. How could an S2 assuming client, assume that the data is actually about another resource? I looked at your specific examples, e.g. Flickr and Jamendo: The RDFa extracted from the Flickr photo page does seem to be ambiguous. I'm guessing the intent is to describe the license of the photo and not the web page. But in that case, isn't the issue that Flickr aren't being precise enough in the data they're returning? The RDFa extracted from the Jamendo page including type information (from the Open Graph Protocol) that says that the resource is an album, and has a specific Creative Commons license. I think that's what's intended isn't it? Why does a client have to assume a specific stance (D2/S2). Why not simply takes the data returned at face value? It's then up to the publisher to be sure that they're making clear assertions. There is no heuristic that will tell you which of the two works is licensed in the stated way, since both interpretations are perfectly meaningful and useful. For mitigation in this case you only have a few options 1. precoordinate (via a disambiguating rule of some kind, any kind) 2. avoid using the URI inside ... altogether - come up with distinct wads of RDF for the 2 documents 3. say locally what you think ... means, effectively treating these URIs as blank nodes Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 20 October 2011 13:25, Ed Summers e...@pobox.com wrote: On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds leigh.do...@talis.com wrote: So, can we turn things on their head a little. Instead of starting out from a position that we *must* have two different resources, can we instead highlight to people the *benefits* of having different identifiers? That makes it more of a best practice discussion and one based on trade-offs: e.g. this class of software won't be able to process your data correctly, or you'll be limited in how you can publish additional data or metadata in the future. I don't think I've seen anyone approach things from that perspective, but I can't help but think it'll be more compelling. And it also has the benefits of not telling people that they're right or wrong, but just illustrate what trade-offs they are making. I agree Leigh. The argument that you can't deliver an entity like a Galaxy to someone's browser sounds increasingly hollow to me. Nobody really expects that, and the concept of a Representation from WebArch/REST explains it away to most technical people. Plus, we now have examples in the wild like OpenGraphProtocol that seem to be delivering drinks, politicians, hotels, etc to machine agents at Facebook just fine. It's the arrival of the OpenGraphProtocol which I think warrants a more careful discussion. It seems to me that we no longer have to try so hard to convince people that giving things de-referencable URIs that return useful data. It's happening now, and there's immediate and obvious benefit, i.e. integration with facebook, better searching ranking, etc. But there does seem to be a valid design pattern, or even refactoring pattern, in httpRange-14 that is worth documenting. Refactoring is how I've been thinking about it too. i.e. under what situations might you want to have separate URIs for its resource and its description? Dave Reynolds has given some good examples of that. Perhaps a good place would be http://patterns.dataincubator.org/book/? I think positioning httpRange-14 as a MUST instead of a SHOULD or MAY made a lot of sense to get the LOD experiment rolling. It got me personally thinking about the issue of identity in a practical way as I built web applications, that I probably wouldn't otherwise have otherwise done. But it would've been easier if grappling with it was optional, and there were practical examples of where it is useful, instead of having it be an issue of dogma. My personal viewpoint is that it has to be optional, because there's already a growing set of deployed examples of people not doing it (OGP adoption), so how can we help those users understand the pitfalls and/or the benefits of a slightly cleaner approach. We can also help them understand how best to publish data to avoid mis-interpretation. Simplify ridiculously just to make a point, we seem to have the following situation: * Create de-referencable URIs for things. Describe them with OGP and/or Schema.org Benefit: Facebook integration, SEO * Above plus addition # URIs or 303s. Benefit: ability to make some finer-grained assertions in some specific scenarios. Tabulator is happy Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi Dave, Thanks for the response, there's some good examples in there. I'm glad that this thread is bearing fruit :) I had a question about one aspect, please excuse the clipping: On 20 October 2011 10:34, Dave Reynolds dave.e.reyno...@gmail.com wrote: ... If you have two resources and later on it turns out you only needed one, no big deal just declare their equivalence. If you have one resource where later on it turns out you needed two then you are stuffed. Ed referred to refactoring. So I'm curious about refactoring from a single URI to two. Are developers necessarily stuffed, if they start with one and later need two? For example, what if I later changed the way I'm serving data to add a Content-Location header (something that Ian has raised in the past, and Michael has mentioned again recently) which points to the source of the data being returned. Within the returned data I can include statements about the document at that URI referred to in the Content-Location header. Doesn't that kind of refactoring help? Presumably I could also just drop in a redirect and adopt the current 303 pattern without breaking anything? Again, I'm probably missing something, but I'm happy to admit ignorance if that draws out some useful discussion :) Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 20 October 2011 23:19, Kingsley Idehen kide...@openlinksw.com wrote: On 10/20/11 5:31 PM, Dave Reynolds wrote: What's more I really don't think the issues is about not understanding about the distinction (at least in the clear cut cases). Most people I talk to grok the distinction, the hard bit is understanding why 303 redirects is a sensible way of making it and caring about it enough to put those in place. What about separating the concept of indirection from its actual mechanics? Thus, conversations about benefits will then have the freedom to blossom. Here's a short list of immediately obvious benefits re. Linked Data (at any scale): 1. access to data via data source names -- millions of developers world wide already do this with ODBC, JDBC, ADO.NET, OLE DB etc.. the only issue is that they are confined to relational database access and all its shortcomings 2. integration of heterogeneous data sources -- the ability to coherently source and merge disparately shaped data culled from a myriad of data sources (e.g. blogs, wikis, calendars, social media spaces and networks, and anything else that's accessible by name or address reference on a network) 3. crawling and indexing across heterogeneous data sources -- where the end product is persistence to a graph model database or store that supports declarative query language access via SPARQL (or even better a combination of SPARQL and SQL) 4. etc... Why is all of this important? Data access, integration, and management has been a problem that's straddled every stage of computer industry evolution. Managers and end-users always think about data conceptually, but continue to be forced to deal with access, integration, and management in application logic oriented ways. In a nutshell, applications have been silo vectors forever, and in doing so they stunt the true potential of computing which (IMHO) is ultimately about our collective quests for improved productivity. No matter what we do, there are only 24 hrs in a day. Most humans taper out at 5-6 hrs before physiological system faults kick in, hence our implicit dependency of computers for handling voluminous and repetitive tasks. Are we there yet? Much closer that most imagine. Our biggest hurdle (as a community of Linked Data oriented professionals) is a protracted struggle re. separating concepts from implementation details. We burn too much time fighting implementation details oriented battles at the expense of grasping core concepts. Maybe I'm wrong but I think people, especially on this list, understanding the overall benefits you itemize. The reason we talk about implementation details is they're important to help people adopt the technology: we need specific examples. We get the benefits you describe from inter-linked dereferenceable URIs, regardless of what format or technology we use to achieve it. Using the RDF model brings additional benefits. What I'm trying to draw out in this particular thread is specific benefits the #/303 additional abstraction brings. At the moment, they seem pretty small in comparison to the fantastic benefits we get from data integrated into the web. Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi Leigh, On 21/10/2011 08:04, Leigh Dodds wrote: Hi Dave, Thanks for the response, there's some good examples in there. I'm glad that this thread is bearing fruit :) I had a question about one aspect, please excuse the clipping: Clipping is the secret to focused email discussions :) On 20 October 2011 10:34, Dave Reynoldsdave.e.reyno...@gmail.com wrote: ... If you have two resources and later on it turns out you only needed one, no big deal just declare their equivalence. If you have one resource where later on it turns out you needed two then you are stuffed. Ed referred to refactoring. So I'm curious about refactoring from a single URI to two. Are developers necessarily stuffed, if they start with one and later need two? For example, what if I later changed the way I'm serving data to add a Content-Location header (something that Ian has raised in the past, and Michael has mentioned again recently) which points to the source of the data being returned. Within the returned data I can include statements about the document at that URI referred to in the Content-Location header. Doesn't that kind of refactoring help? Helps yes, but I don't think it solves everything. Suppose you have been using http://example.com/lovelypictureofm31 to denote M31. Some data consumers use your URI to link their data on M31 to it. Some other consumers started linking to it in HTML as an IR (because they like the picture and the accompanying information, even though they don't care about the RDF). Now you have two groups of users treating the URI in different ways. This probably doesn't matter right now but if you decide later on you need to separate them then you can't introduce a new URI (whether via 303 or content-location header) without breaking one or other use. Not the end of the world but it's not a refactoring if the test cases break :) Does that make sense? Dave
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 21 October 2011 08:47, Dave Reynolds dave.e.reyno...@gmail.com wrote: ... On 20 October 2011 10:34, Dave Reynoldsdave.e.reyno...@gmail.com wrote: ... If you have two resources and later on it turns out you only needed one, no big deal just declare their equivalence. If you have one resource where later on it turns out you needed two then you are stuffed. Ed referred to refactoring. So I'm curious about refactoring from a single URI to two. Are developers necessarily stuffed, if they start with one and later need two? For example, what if I later changed the way I'm serving data to add a Content-Location header (something that Ian has raised in the past, and Michael has mentioned again recently) which points to the source of the data being returned. Within the returned data I can include statements about the document at that URI referred to in the Content-Location header. Doesn't that kind of refactoring help? Helps yes, but I don't think it solves everything. Suppose you have been using http://example.com/lovelypictureofm31 to denote M31. Some data consumers use your URI to link their data on M31 to it. Some other consumers started linking to it in HTML as an IR (because they like the picture and the accompanying information, even though they don't care about the RDF). Now you have two groups of users treating the URI in different ways. This probably doesn't matter right now but if you decide later on you need to separate them then you can't introduce a new URI (whether via 303 or content-location header) without breaking one or other use. Not the end of the world but it's not a refactoring if the test cases break :) Does that make sense? No, I'm still not clear. If I retain the original URI as the identifier for the galaxy and add either a redirect or a Content-Location, then I don't see how I break those linking their data to it as their statements are still made about the original URI. But I don't see how I'm breaking people linking to it as if it were an IR. That group of people are using my resource ambiguously in the first place. Their links will also still resolve to the same content. L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On Fri, Oct 21, 2011 at 2:42 AM, Leigh Dodds leigh.do...@talis.com wrote: Hi, On 19 October 2011 23:10, Jonathan Rees j...@creativecommons.org wrote: On Wed, Oct 19, 2011 at 5:29 PM, Leigh Dodds leigh.do...@talis.com wrote: Hi Jonathan I think what I'm interested in is what problems might surface and approaches for mitigating them. I'm sorry, the writeup was designed to do exactly that. In the example in the conflict section, a miscommunication (unsurfaced disagreement) leads to copyright infringement. Isn't that a problem? Yes it is, and these are the issues I think that are worth teasing out. I'm afraid though that I'll have to admit to not understanding your specific example. There's no doubt some subtlety that I'm missing (and a rotten head cold isn't helping). Can you humour me and expand a little? The bit I'm struggling with is: [[[ http://example/x xhv:license http://creativecommons.org/licenses/by/3.0/. According to D2, this says that document X is licensed. According to S2, this says that document Y is licensed ]]] Taking the RDF data at face value, I don't see how the D2 and S2 interpretations differ. Both say that http://example/x has a specific license. How could an S2 assuming client, assume that the data is actually about another resource? By observing D2. D2 is the page at that URI, it is not what is described by the page. For example, one talks describes the image, while the other doesn't. You get different answers. I'm not sure how to be more clear. I looked at your specific examples, e.g. Flickr and Jamendo: The RDFa extracted from the Flickr photo page does seem to be ambiguous. I'm guessing the intent is to describe the license of the photo and not the web page. But in that case, isn't the issue that Flickr aren't being precise enough in the data they're returning? If you adopt the httpRange-14 rule, what this does is make the Flickr and Jamendo pages wrong, and if *they* agree, they will change their metadata. The eventual advantage is that there will be no need to be clear since a different URI (or blank node) will clearly be used to name the photo, and will be understood in that way. I feel you're doing a bait-and-switch here. The topic is, what does the httpRange-14 rule do for you, NOT whether a different rule (such as just read the RDF) is better than it for some purposes, or what sort of agreement might we want to attempt. If you want to do a comparison of different rules, please change the subject line. To summarize: - A rule is something that helps eliminate judgment and uncertainty, and, ideally, facilitates automated processing. - These URIs (hashless retrieval-enabled ones) are currently being used in two different and incompatible ways. In the issue-57 document I call these ways direct (it's the document found there) and indirect (just read the RDF). - If there is no rule, then you can't use one of these URIs without further explanation as to which way is meant (being clear). Maybe that's OK. - Any particular rule will assign 0 or more URIs as direct and 0 or more as indirect. Any time any URI is assigned *either* way some benefit will ensue to someone, because uses of the URI in that way will not require further explanation. - The httpRange-14 rule assigns one of the two ways to all affected URIs. The advantage is that people who want to use URIs in this way, will be able to use them in this way, and be understood. That is, it gives you a way to refer to anything on the web - even if you don't know how to read its content, don't trust the content, etc. It is a legacy solution since it grandfathers everything that was on the web before we started using URIs in these new and different ways. - Other rules will have advantages in other situations. What the httpRange-14 rule does for you can be understood independently of the virtues of other rules, such as the one Ian Davis put forth last fall, or the more radical rule that says that all such URIs are indirect. What httpRange-14 does for you is a different matter from whether something else is better. If you want to shift to comparison shopping, please change the subject line. The RDFa extracted from the Jamendo page including type information (from the Open Graph Protocol) that says that the resource is an album, and has a specific Creative Commons license. I think that's what's intended isn't it? Why does a client have to assume a specific stance (D2/S2). Why not simply takes the data returned at face value? It's then up to the publisher to be sure that they're making clear assertions. Taking the information at face value *is* a stance - that's exactly the S2 (indirect) approach. Saying that all hashless retrieval-enabled URIs are indirect (S2) would be a perfectly principled and coherent approach, it's just not the one the TAG advised in 2005. You have to take a stand (if you use these URIs without somehow specifying the mode) because in almost all cases D2 and
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On Fri, Oct 21, 2011 at 8:15 AM, Jonathan Rees j...@creativecommons.org wrote: How could an S2 assuming client, assume that the data is actually about another resource? By observing D2 Sorry, I'm speaking nonsense. The point is, that if you assume S2 or or you assume D2, you'll know (or you'll think you know) what is being talked about. But you'll get different answers in the two situations. If D2 (direct reference) is assumed uniformly - no problem. If S2 (indirect) is assumed - no problem. If sometimes one and sometimes the other - chaos if there is no consensus rule that clearly says when one and when the other. httpRange-14 is just one such possible rule, and it shares with the other candidate rules the benefit of being a rule. Jonathan
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Nathan, hello. On 2011 Oct 20, at 12:54, Nathan wrote: Norman Gray wrote: Ugh: 'IR' and 'NIR' are ugly obscurantist terms (though reasonable in their original context). Wouldn't 'Bytes' and 'Thing', respectively, be better (says he, plaintively)? Both are misleading, since NIR is the set of all things, and IR is a proper subset of NIR, it doesn't make much sense to label it non information resource(s) when it does indeed contain information resources. From that perspective IR and R makes somewhat more sense. That's true, and clarifying. Or, more formally, R is the set of all resources (?equivalent to things named by a URI). IR is a subset of that, defined as all the things which return 200 when you dereference them. NIR is then just R \ IR. It's NIR that's of interest to this discussion, but there's no way of indicating within HTTP that a resource is in that set [1], only that something is in IR. Back to your regularly scheduled argumentation... Norman [1] Though there is, implicitly, within any RDF that one might subsequently receive -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On Fri, Oct 21, 2011 at 1:15 PM, Jonathan Rees j...@creativecommons.orgwrote: If you adopt the httpRange-14 rule, what this does is make the Flickr and Jamendo pages wrong, and if *they* agree, they will change their metadata. The eventual advantage is that there will be no need to be clear since a different URI (or blank node) will clearly be used to name the photo, and will be understood in that way. I feel you're doing a bait-and-switch here. The topic is, what does the httpRange-14 rule do for you, NOT whether a different rule (such as just read the RDF) is better than it for some purposes, or what sort of agreement might we want to attempt. If you want to do a comparison of different rules, please change the subject line. I don't think this was a bait-and-switch. I think Leigh made clear that he was questioning whether we should spend so much time making pages (and people) wrong. As he said: Instead of starting out from a position that we *must* have two different resources, can we instead highlight to people the *benefits* of having different identifiers? Telling someone they are wrong because they don't follow a rule that they don't understand or don't see a benefit to is a *must* position. Explaining how the httpRange-14 rule is better than another is explaining the *benefits* of having different identifiers. -Lin
Fully Funded PhD Studentship @ DDIS at the University of Zurich
The Dynamic and Distributed Information Systems Group at the University of Zurich (http://www.ifi.uzh.ch/ddis) is looking for a *research doctoral student* with a keen interest in: * Large-scale Graph Processing * Semantic Web / Linked Data * Distributed Computing to work on Signal/Collect - a framework for synchronous and asynchronous parallel graph processing that allows programmers to express many algorithms on graphs in a concise and elegant way. More information on Signal/Collect can be found on our project page (http://www.ifi.uzh.ch/ddis/research/sc.html http://code.google.com/p/signal-collect/). We offer: * motivated colleagues who are passionate about research * a work environment that is well equipped with the newest hardware and software technology * a salary according to the standard university regulations ( 57'000 CHF / year; increases with experience) * support for your personal development and career planning * an attractive work environment both within the research group and beyond (Zurich is repeatedly voted the city with the highest standard of living in the world) * A highly successful PhD program with graduates at top rated institutions world-wide You will be collaborating in a larger research team consisting of the Dynamic and Distributed Information Systems Group (DDIS)(http://www.ifi.uzh.ch/ddis/) headed by Prof. Abraham Bernstein, which is part of the Department of Informatics of the University of Zurich. The group is active in International and Swiss National research projects and we are looking for candidates to help us continue these efforts. You have: * a master's degree in informatics, computer science (or an equivalent university study) * expertise in database systems, distributed computing, or Semantic Web / Linked Data (note: only expertise in one of the fields is necessary; multiple areas is desirable) * good programming skills in several languages (Scala a plus) * an interest in applying computer science research to real-world problems * excellent command of English * German is a plus but not required If you are interested: Send your application (including CV, final grades, and - if possible - thesis copy as a PDF file) via e-mail to: Prof. Abraham Bernstein, Ph.D. Department of Informatics University of Zurich, Switzerland http://www.ifi.uzh.ch/ddis/bernstein.html Email: ddisjobs at lists dot ifi dot uzh dot ch The University of Zurich is committed to enhancing the number of women in scientific positions and, therefore particularly invites women to apply. Women who are as qualified for the position in question as male applicants will be given priority. -- - | Professor Abraham Bernstein, PhD | University of Zürich, Department of Informatics | web:http://www.ifi.uzh.ch/ddis/bernstein.html
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Norman Gray wrote: Nathan, hello. On 2011 Oct 20, at 12:54, Nathan wrote: Norman Gray wrote: Ugh: 'IR' and 'NIR' are ugly obscurantist terms (though reasonable in their original context). Wouldn't 'Bytes' and 'Thing', respectively, be better (says he, plaintively)? Both are misleading, since NIR is the set of all things, and IR is a proper subset of NIR, it doesn't make much sense to label it non information resource(s) when it does indeed contain information resources. From that perspective IR and R makes somewhat more sense. That's true, and clarifying. Or, more formally, R is the set of all resources (?equivalent to things named by a URI). IR is a subset of that, defined as all the things which return 200 when you dereference them. NIR is then just R \ IR. Indeed, I just wrote pretty much the same thing, but with a looser definition at [1], snipped here: The only potential clarity I have on the issue, and why I've clipped above, is that I feel the /only/ property that distinguishes an IR from anything else in the universe, is that it has a [transfer/transport]-protocol as a property of it. In the case of HTTP this would be anything that has an HTTP Interface as a property of it. If we say that anything with this property is a member of set X. If an interaction with the thing named p:y, using protocol 'p:', is successful, then p:y is a member of X. An X of course, being what is currently called an Information Resource. Taking this approach would then position 303 as a clear opt-out built in to HTTP which allows a server to remain indifferent and merely point to some other X which may, or may not, give one more information as to what p:y refers to. [1] http://lists.w3.org/Archives/Public/www-tag/2011Oct/0078.html That's my understanding of things any way. It's NIR that's of interest to this discussion, but there's no way of indicating within HTTP that a resource is in that set [1], only that something is in IR. Correct, and I guess technically, and logically, HTTP can only ever have awareness of things which have an HTTP Interface as a property. So arguing for HTTP to cater for non HTTP things, seems a little illogical and I guess, impossible. Back to your regularly scheduled argumentation... Aye, as always, carry on! Best, Nathan
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On 10/21/11 3:09 AM, Leigh Dodds wrote: [SNIP] What I'm trying to draw out in this particular thread is specific benefits the #/303 additional abstraction brings. At the moment, they seem pretty small in comparison to the fantastic benefits we get from data integrated into the web. Data is already integrated on the Web. The issue is quality and cost of said integration. People using the Web as an information space already work with Data. The problem is that said Data manifests as coarse grained data objects (resources). Thus, people have to resort to brute force integration of disparate data sources. Simple example, an in ability to Find stuff with precision. Ditto inability to publish data object identifiers that have a high propensity for serendipitous discovery. How does 303 on slash URIs help? It enables all those existing identifiers on the Web to serve as bona fide linked data oriented identifiers. Basically, this is about the fact that Web users do the following, will continue to do so: 1. Use location names or data source names (URLs) as actual data object identifiers -- inherently ambiguous re. fidelity of fine grained linked data 2. Don't expect to be burdened with the mechanics of de-referencable identifiers that acts as names/handles -- and rightfully so. Linked Data solution developers (client or server side) need to accept the following: 1. There are, and will always be more slash based URIs than there ever will be hash based URIs -- blogging, tweeting, commenting ensure that 2. Name and Address disambiguation is critical to any system that deals with fine grained data objects -- that's how it works elsewhere and the Web's architecture already reflects this reality . What about not doing a 303 on slash URIs i.e., just a 200 OK? That's an option, but it cannot take the form of a replacement for HTTP 303. This option introduces certain requirements on the part of linked data clients that includes: 1. local disambiguation of object Name and Address. 2. A dependency on relation semantics which ultimately leads to agreement challenges re. vocabularies -- remember, this whole effort is supposed to be about loose and late binding of data objects to vocabularies/schemas/ontologies. Conclusion: The fundamental benefit of slash URIs and 303 boils down to non disruptive manifestation of the Web's data space dimensions. Let's put existing global scale identifiers on the Web to good use. Technology vendors should take on the burden of handling linked data fidelity. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On Fri, Oct 21, 2011 at 8:34 AM, Lin Clark lin.w.cl...@gmail.com wrote: On Fri, Oct 21, 2011 at 1:15 PM, Jonathan Rees j...@creativecommons.org wrote: If you adopt the httpRange-14 rule, what this does is make the Flickr and Jamendo pages wrong, and if *they* agree, they will change their metadata. The eventual advantage is that there will be no need to be clear since a different URI (or blank node) will clearly be used to name the photo, and will be understood in that way. I feel you're doing a bait-and-switch here. The topic is, what does the httpRange-14 rule do for you, NOT whether a different rule (such as just read the RDF) is better than it for some purposes, or what sort of agreement might we want to attempt. If you want to do a comparison of different rules, please change the subject line. I don't think this was a bait-and-switch. I think Leigh made clear that he was questioning whether we should spend so much time making pages (and people) wrong. Come on, I never said making someone wrong was a virtue. I was just answering honestly the question, what would happen to those pages if we adopted the rule? Well, those pages would break. That would be sad. Jamendo and Flickr are negative examples. This is a criticism of the rule. Maybe that's enough reason to amend the rule, I don't know. If you adopted a different rule, something else would break, like an application that reports on the content of RDF pages. Because current practice is so mixed, we will never end up with 100% compliance with ANY rule, even one that says that all references are indirect. But that's not what we were talking about. I wasn't trying to argue in favor or against compared to alternatives. I was only trying to answer the question that was asked, which was what does it do for you. Like all rules, it lowers entropy, and does it in a certain way that supports certain uses and doesn't support other uses. As he said: Instead of starting out from a position that we *must* have two different resources, can we instead highlight to people the *benefits* of having different identifiers? Telling someone they are wrong because they don't follow a rule that they don't understand or don't see a benefit to is a *must* position. Explaining how the httpRange-14 rule is better than another is explaining the *benefits* of having different identifiers. -Lin There's a different question that I skipped over because it seems unrelated, which is whether you need different URIs for different things. I'm not certain how to answer that. This is an interoperability issue. If a URI U refers to two documents A and B, and I say U has title Right, which document am I referring to, A or B? That is, which has that title? (or author, etc.) Either you don't care, in which case there's no reason to say it, or you care, in which case you have to invent some additional signal to communicate the distinction. The question of how many URIs you need has almost nothing to do with httpRange-14. It would arise no matter how you ended up choosing between direct vs. indirect. Jonathan
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On 10/21/11 8:57 AM, Nathan wrote: Norman Gray wrote: Nathan, hello. On 2011 Oct 20, at 12:54, Nathan wrote: Norman Gray wrote: Ugh: 'IR' and 'NIR' are ugly obscurantist terms (though reasonable in their original context). Wouldn't 'Bytes' and 'Thing', respectively, be better (says he, plaintively)? Both are misleading, since NIR is the set of all things, and IR is a proper subset of NIR, it doesn't make much sense to label it non information resource(s) when it does indeed contain information resources. From that perspective IR and R makes somewhat more sense. That's true, and clarifying. Or, more formally, R is the set of all resources (?equivalent to things named by a URI). IR is a subset of that, defined as all the things which return 200 when you dereference them. NIR is then just R \ IR. Indeed, I just wrote pretty much the same thing, but with a looser definition at [1], snipped here: The only potential clarity I have on the issue, and why I've clipped above, is that I feel the /only/ property that distinguishes an IR from anything else in the universe, is that it has a [transfer/transport]-protocol as a property of it. In the case of HTTP this would be anything that has an HTTP Interface as a property of it. If we say that anything with this property is a member of set X. If an interaction with the thing named p:y, using protocol 'p:', is successful, then p:y is a member of X. An X of course, being what is currently called an Information Resource. Taking this approach would then position 303 as a clear opt-out built in to HTTP which allows a server to remain indifferent and merely point to some other X which may, or may not, give one more information as to what p:y refers to. [1] http://lists.w3.org/Archives/Public/www-tag/2011Oct/0078.html That's my understanding of things any way. It's NIR that's of interest to this discussion, but there's no way of indicating within HTTP that a resource is in that set [1], only that something is in IR. Correct, and I guess technically, and logically, HTTP can only ever have awareness of things which have an HTTP Interface as a property. So arguing for HTTP to cater for non HTTP things, seems a little illogical and I guess, impossible. Back to your regularly scheduled argumentation... Aye, as always, carry on! Nice explanation. You've just explained: 1. why http scheme based names/handles for data objects are powerful but unintuitive 2. why data object names, addresses, and representation must always be distinct. The distinct between a URI (generic name/handle) and a URL (locator/address) remains the root cause of confusion. We have two *things* that are superficially identical (due to syntax) but conceptually different. The core concept is always the key to negating superficial distraction associated syntax :-) Link: 1. http://tools.ietf.org/html/rfc3305 -- an interesting read re. URIs and URLs. Best, Nathan -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On Fri, Oct 21, 2011 at 8:33 AM, Norman Gray nor...@astro.gla.ac.uk wrote: Nathan, hello. It's NIR that's of interest to this discussion, but there's no way of indicating within HTTP that a resource is in that set [1], only that something is in IR. The important distinction, I think, is not between one kind of resource and another, but between the manner in which a URI comes to be associated with a resource. Terminology is helpful, which is why people have latched onto NIR, and one possibility is direct (for old-fashioned Web URIs) and indirect (for semweb / linked data), applied not to resources but to URIs. A direct URI always names an IR (in fact a particular one: the one at that URI), but an indirect one can name either an NIR or an IR (as in the http://www.w3.org/2001/tag/2011/09/referential-use.html, and as deployed at http://dx.doi.org/ ). HR14a says (in effect) all retrieval-enabled hashless URIs are direct, but other rules (like Ian Davis's) might say other things; the terms are useful independent of the architecture. There might be situations in which 'NIR' is a useful category, but I don't know of any. If you say things like 303 implies NIR (which is not justified by httpRange-14 or anything else), you get into trouble with indirectly named IRs like those at dx.doi.org. One could adopt a new rule that says an indirect URI cannot name an IR, in which case if you knew the IR/NIR classification you could know which kind of URI you had to use and vice versa, but this seems limiting, unnecessary, and incompatible. Jonathan
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Leigh and all, hello. On 2011 Oct 21, at 12:52, Leigh Dodds wrote: Hi, On 21 October 2011 08:47, Dave Reynolds dave.e.reyno...@gmail.com wrote: ... [...] Suppose you have been using http://example.com/lovelypictureofm31 to denote M31. Some data consumers use your URI to link their data on M31 to it. Some other consumers started linking to it in HTML as an IR (because they like the picture and the accompanying information, even though they don't care about the RDF). Now you have two groups of users treating the URI in different ways. This probably doesn't matter right now but if you decide later on you need to separate them then you can't introduce a new URI (whether via 303 or content-location header) without breaking one or other use. Not the end of the world but it's not a refactoring if the test cases break :) [...] But I don't see how I'm breaking people linking to it as if it were an IR. That group of people are using my resource ambiguously in the first place. Their links will also still resolve to the same content. There's always, in practice, going to be ambiguity in this space, either because data providers are ambiguous about what their URIs denote, or because data consumers misunderstand or misuse them. The 200/303 distinction is about trying to force providers to making their URIs unambiguous (in an IR/NIR sense). It's starting to sound, to me, as if the costs of this are subtle but messily real, and may well outweigh the benefits of a goal which is receding as more information providers produce ambiguous URIs, simply because their priorities are elsewhere (for example OGP or Facebook, if I'm understanding those two examples correctly). This is an argument for conceding defeat on the 200/303 thing. I think we've been here before. Back in November 2010, there was a thread about Ian Davis's suggestion that NIRs should simply return RDF with a 200, explaining in that RDF that they're NIRs http://blog.iandavis.com/2010/11/04/is-303-really-necessary/. My understanding of that was http://lists.w3.org/Archives/Public/public-lod/2010Nov/0115.html: httpRange-14 requires that a URI with a 200 response MUST be an IR; a URI with a 303 MAY be a NIR. Ian is (effectively) suggesting that a URI with a 200 response MAY be an IR, in the sense that it is defeasibly taken to be an IR, unless this is contradicted by a self-referring statement within the RDF obtained from the URI. The list of references after that message provide very interesting reading (the whole thread is good, and this current one is recapitulating lots of it). David Booth, in a couple of messages including http://lists.w3.org/Archives/Public/public-lod/2010Nov/0235.html, focuses on the ambiguity created by Ian's suggestion. What this current thread seems to be suggesting is that this ambiguity is there anyway, and it's just going to get worse, so that the solutions are (a) that any information architect should be clear in their own mind about the IR/NIR distinction, and (b) that there should be ways of resolving the ambiguity in a non-heuristic way. Ian's November 2010 suggestion seems to do that. All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Dave, hello. On 2011 Oct 20, at 22:31, Dave Reynolds wrote: Benefit 2: Conceptual cleanliness and hedging your bets [...]Even if we can't spot the practical problems right now then differentiating between the galaxy itself and some piece of data about the galaxy could turn out to be important in practice. It is. I want to say that 'line 123 in this catalogue [an existing RDBMS] and line 456 in that one both refer to the same galaxy, but they give different values for its surface brightness'. There's no way I can articulate that unless I'm explicitly clear about the difference between a billion suns and a database row. [...] Perhaps benefit 2 could be reframed as being about forcing you to confront the map/territory distinction so you end up doing better modelling - whether or not you implement 303s. I think that's _very_ true. Perhaps one can say that any information architect should understand the IR/NIR distinction, however they subsequently decide to represent this. I think the discussion Leigh was trying to start was can we more clearly article those benefits of the 'right way'. I was taking a shot a that, maybe a very limited off-target one. While I think it's very important to be clear about precisely what one's URIs refer to, I'm starting to wonder if the benefits of the 'right way' (which is the IR/NIR and 200/303 distinction, right?) really are all that massive. I think your listing of the costs and benefits http://lists.w3.org/Archives/Public/public-lod/2011Oct/0158.html is a useful summary. Most people I talk to grok the distinction, the hard bit is understanding why 303 redirects is a sensible way of making it and caring about it enough to put those in place. Yes: it's becoming clearer to me that this is what the discussion is really about, even though it started off being about the lament why don't people understand this distinction?. You also commented on ways to represent observational data. (1) Describe the observations explicitly using something like ISO OM or the DataCube vocabulary: http://catalogue1.com/observation123 a qb:Observation; eg:galaxy http://iau.org/id/galaxy/m31; eg:brightness 6.5 ; eg:obsdate '2011-10-10'^^xsd:date ; qb:dataset http://catalogue1.com/catalogue/2011 . http://catalogue2.com/observation456 a qb:Observation; eg:galaxy http://iau.org/id/galaxy/m31; eg:brightness 6.8 ; eg:obsdate '2011-09-01'^^xsd:date ; qb:dataset http://catalogue2.com/catalogue/2011 . (2) Each catalogue gives its own URI to its understanding of the galaxy so it can assert things directly about it without conflict: http://catalogue1.com/galaxy/m31 eg:brightness 6.5; eg:correspondsTohttp://iau.org/id/galaxy/m31 . http://catalogue2.com/galaxy/m31 eg:brightness 6.8; eg:correspondsTohttp://iau.org/id/galaxy/m31 . For huge numbers of objects, the _only_ name they have is their number in some observational catalogue or other -- there's no canonical IAU name. In a current project, we're setting up the support to be able to say http://catalogue1.com/galaxy/123 cat1:brightness xxx. http://catalogue2.com/galaxy/456 cat2:brightness yyy. http://catalogue1.com/galaxy/123 owl:sameAs http://catalogue2/galaxy/456. We probably also want to reify the database rows where the first two statements come from, in order to make last-modified-like statements about them, but whether we do that with a named graph, or some other way, is a problem we haven't had to confront quite yet. In *none* of those cases doesn't it make any difference whether when I dereference http://iau.org/id/galaxy/m31 in a browser I get a web page saying I denote the galaxy M31 or I get a 303 redirect to something like http://iau.org/doc/galaxy/m31 which in turn connegs to a web page saying The URI you started with denoted the galaxy M31, me I'm just a web page, you can tell me by the way I walk. Well, I think it does matter, because in this case, the thing named http://catalogue1.com/galaxy/123 could plausibly be either the galaxy or the database row (and I suppose I could claim the latter as a NIR, with a following wind), and I'd need to be able to state, somewhere, which it is. But that's handled by my providing some RDF somewhere which explains which it is: the problem is how to get to that RDF without drawing some ambiguous or wrong conclusions on the way. Best wishes, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On Fri, 2011-10-21 at 09:17 -0400, Jonathan Rees wrote: [ . . . ] There's a different question that I skipped over because it seems unrelated, which is whether you need different URIs for different things. +1 I'm not certain how to answer that. This is an interoperability issue. If a URI U refers to two documents A and B, and I say U has title Right, which document am I referring to, A or B? That is, which has that title? (or author, etc.) Either you don't care, in which case there's no reason to say it, or you care, in which case you have to invent some additional signal to communicate the distinction. Right, though I would call it an application issue rather than an interoperability issue, because whether or not it is important to distinguish the two depends entirely on the application. Ambiguity/unambiguity should not be viewed as an absolute, but as *relative* to a particular application or class of applications: a URI that is completely unambiguous to one application may be hopelessly ambiguous to a different application that requires finer distinctions. See Resource Identity and Semantic Extensions: Making Sense of Ambiguity http://dbooth.org/2010/ambiguity/paper.html The question of how many URIs you need has almost nothing to do with httpRange-14. It would arise no matter how you ended up choosing between direct vs. indirect. +1. With or without httpRange-14, there will always be URIs that are unambiguous to some applications and ambiguous to others. This is the inescapable consequence of the fact that, for the most part, it is impossible to define anything completely unambiguously -- a principle well discussed and established in philosophy. -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Jonathan, hello. On 2011 Oct 21, at 14:46, Jonathan Rees wrote: A direct URI always names an IR (in fact a particular one: the one at that URI), but an indirect one can name either an NIR or an IR (as in the http://www.w3.org/2001/tag/2011/09/referential-use.html, and as deployed at http://dx.doi.org/ ). HR14a says (in effect) all retrieval-enabled hashless URIs are direct, but other rules (like Ian Davis's) might say other things; the terms are useful independent of the architecture. There might be situations in which 'NIR' is a useful category, but I don't know of any. I can see that distinction, and the value in it. I still think that 'NIR' is a useful category -- in a way it's the simpler category of the two: you cannot download a NIR, no matter how many indirections you follow, whereas if you start following indirect links, you might end up at a direct link. Or: an NIR is one of the 'things' that's being talked about in the 'internet of things'. If you say things like 303 implies NIR (which is not justified by httpRange-14 or anything else), I don't think anyone's so confused as to say 303 implies NIR. A lot of things would probably be simpler, though, if there were a 20x or 30x status code which did mean this names an NIR, and the content is just commentary on, or depictions of, that thing (that's not a suggestion, by the way!) All the best, Norman -- Norman Gray : http://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On 10/21/11 10:53 AM, David Booth wrote: Right, though I would call it an application issue rather than an interoperability issue, because whether or not it is important to distinguish the two depends entirely on the application. Ambiguity/unambiguity should not be viewed as an absolute, but as *relative* to a particular application or class of applications: a URI that is completely unambiguous to one application may be hopelessly ambiguous to a different application that requires finer distinctions. See Resource Identity and Semantic Extensions: Making Sense of Ambiguity http://dbooth.org/2010/ambiguity/paper.html +1 Examples of different applications/services where the above applies: 1. World Wide Web -- as a global information space. 2. World Wide Web -- as a global data space. 3. World Wide Web -- as a global knowledge space. httpRange-14 enables Web users straddle the items above without consequence. The hyperlink is still the driver of application experience. The question of how many URIs you need has almost nothing to do with httpRange-14. It would arise no matter how you ended up choosing between direct vs. indirect. +1. With or without httpRange-14, there will always be URIs that are unambiguous to some applications and ambiguous to others. This is the inescapable consequence of the fact that, for the most part, it is impossible to define anything completely unambiguously -- a principle well discussed and established in philosophy. +1 -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen smime.p7s Description: S/MIME Cryptographic Signature
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
On 21/10/2011 12:52, Leigh Dodds wrote: Hi, On 21 October 2011 08:47, Dave Reynoldsdave.e.reyno...@gmail.com wrote: ... On 20 October 2011 10:34, Dave Reynoldsdave.e.reyno...@gmail.comwrote: ... If you have two resources and later on it turns out you only needed one, no big deal just declare their equivalence. If you have one resource where later on it turns out you needed two then you are stuffed. Ed referred to refactoring. So I'm curious about refactoring from a single URI to two. Are developers necessarily stuffed, if they start with one and later need two? For example, what if I later changed the way I'm serving data to add a Content-Location header (something that Ian has raised in the past, and Michael has mentioned again recently) which points to the source of the data being returned. Within the returned data I can include statements about the document at that URI referred to in the Content-Location header. Doesn't that kind of refactoring help? Helps yes, but I don't think it solves everything. Suppose you have been using http://example.com/lovelypictureofm31 to denote M31. Some data consumers use your URI to link their data on M31 to it. Some other consumers started linking to it in HTML as an IR (because they like the picture and the accompanying information, even though they don't care about the RDF). Now you have two groups of users treating the URI in different ways. This probably doesn't matter right now but if you decide later on you need to separate them then you can't introduce a new URI (whether via 303 or content-location header) without breaking one or other use. Not the end of the world but it's not a refactoring if the test cases break :) Does that make sense? No, I'm still not clear. If I retain the original URI as the identifier for the galaxy and add either a redirect or a Content-Location, then I don't see how I break those linking their data to it as their statements are still made about the original URI. But I don't see how I'm breaking people linking to it as if it were an IR. That group of people are using my resource ambiguously in the first place. Their links will also still resolve to the same content. Ah OK. So you introduce a new, different IR, but preserve the conneg so that old HTML pages links to the picture still resolve. Yes you are right, I think that does work. Dave