Re: RDF Update Feeds + URI time travel on HTTP-level
On Thu, 2009-11-26 at 00:04 +, Richard Cyganiak wrote: If you choose such a rather broad definition for agent-driven negotiation, then you surely must count the practice of sending different responses based on client IP or User-Agent header, both of which are common on the Web, as examples for server-driven conneg. And even different responses based on the client's Cookie header. -- Toby A Inkster mailto:m...@tobyinkster.co.uk http://tobyinkster.co.uk
Re: RDF Update Feeds + URI time travel on HTTP-level
At Wed, 25 Nov 2009 00:21:04 -0500, Michael Nelson wrote: Hi Erik, Thanks for your response. I'm just going to cherry pick a few bits from it: As an aside, which may or may not be related to Memento, do you think there is a useful distinction to be made between web archives which preserve the actual bytestream of an HTTP response made at a certain time (e.g., the Internet Archive) and CMSs that preserve the general content, but allow headers, advertisements, and so on to change (e.g., Wikipedia). To see what I mean, visit: http://en.wikipedia.org/w/index.php?title=World_Wide_Weboldid=9419736 and then: http://web.archive.org/web/20050213030130/en.wikipedia.org/wiki/World_Wide_Web I am not sure what the relationship is between these two resources. I'm not 100% sure either. I think this is a difficult problem in web archiving in general. The wikipedia link with current content substituted is not exactly the 2005 version, but the IA version isn't really what a user would have seen in 2005 either (at least in terms of presentation). And: http://web.archive.org/web/20080103014411/http://www.cnn.com/ for example gives me at least a pop-up add that is relative to today, not Jan 2008 (there may be better examples where today's content is in-lined, but the point remains the same). I can’t find the popup, but the point is well taken. The problem of what I call ‘breaking out’ of archived web content is a very real one when archived web sites are displayed without browser support, using URI ‘rewriting’ and other tricks. The possibility of coming up with a solution for this problem is one reason why I am very excited about this discussion. Still, I think the intention of IA is different from that of Wikipedia’s previous versions. IA attempts to capture and replay the web exactly was it was, while Wikipedia presents its essential content in the same way while surrounding it with the latest tools. While either solution would be helpful to somebody researching the history of a Wikipedia article or to somebody looking for the previous version, only IA’s approach gives you the advertisements, etc., that can be very helpful for researchers. There is the further issue of the fact that IA’s copy is a third part and in some ways more trustworthy. Whether sites can generally be trusted to maintain accurate archives of their own content is a question that has already been answered, in my opinion. (The answer is, they can’t.) See, e.g., [1]. As an aside, the Zoetrope (http://doi.acm.org/10.1145/1498759.1498837) took an entirely different approach to this problem in their archives (see pp. 246-247). They basically took DOM dumps from the client and saved them, rather than a crawler-based URI approach. Thanks for the pointer. My confusion on this issue stems, I believe, from a longstanding confusion that I have had with the 302 Found response. My understanding of 302 Found has always been that, if I visit R and receive a 302 Found with Location R', my browser should continue to consider R the canonical version and use it for all further requests. If I bookmark R' after having been redirected to R, it is in fact R which should be bookmarked, and not R'. If I use my browser to send that link to a friend, my browser should send R, not R'. I believe that this is the meaning given to 302 Found in [3]. I am aware that browsers do not implement what I consider to be the correct behavior here, but it is the way that I understand the definition of 302 Found. Perhaps somebody could help me out by clarifying this for me? Firefox will attempt to do the right thing, but it depends on the client maintaining state about the original URI. If you dereference R, then get 302'd to R', a reload in Firefox will be on R and not R'. I hadn’t noticed this before, thank you for pointing it out. Obviously, if you email or share or probably even bookmark R', then this client-side state will be lost and 3rd party reloads will be relative to R' (in fact, that might be want you *want* to occur). But at least within a session, Firefox (and possibly other browsers) will reload wrt to the original URI. Although it is not explicit in the current paper or presentation, we're planning on some method for having R' point back to R to facilitate Memento-aware clients to know the original URI. We're not sure syntactically how it should be done (a value in the Alternates response header maybe?), but semantically we want R' to point to R. This I think your email got cut off there. In any case, in the context actual existing implementations of 302, I think Memento is doing the correct thing. That is, redirection from R to the appropriate content (R') based on conneg make sense to me, for Memento, if what the user can bookmark and see is the conneg’ed URI (R') My belief (see [2] and especially [3]) is that properly behaving
Re: RDF Update Feeds + URI time travel on HTTP-level
On Wed, Nov 25, 2009 at 6:08 PM, Michael Nelson m...@cs.odu.edu wrote: I disagree. I would say that agent-driven negotiation is by far the most common form of conneg in use today. Only it's not done through standardized means such as the Alternates header, but instead via language and format specific links embedded in HTML, e.g. Click here for the PDF version, or a Language/country-selector dropdown in the page header, or even via Javascript based selection. While the exact line between them might be hard to draw, I'd argue those aren't HTTP-level events, but instead are HTML-level events. In other words, I would call those examples navigation. In addition, navigation works well for things that can be expressed in HTML wrappers (e.g., click here for the PDF version), but not really for embed/img tags where you want to choose between, say, .png .gif. I don't draw much of a distinction there, at least for the purposes of discussions like this; they are all URLs in an HTTP response message. Server driven conneg, in comparison, is effectively unused. Ditto for transparent negotiation. I think that is an unfair characterization. I won't guess as to how often it is done, but it is done. It is just not perceived by the user. I didn't mean to imply it wasn't done. As Richard (and Larry, in his referenced message) point out, User-Agent conneg is pretty common. I was just trying to point out that it's not used nearly as often as client selection. Almost every browser sends out various Accept request headers, and it is not uncommon to have Vary and TCN: Choice response headers (check responses from w3c.org, for example). When done with the 200 response + Content-Location header, the URI that the browser displays does not change. I used to use w3.org as an example too, but I've learned since that it's the exception, not the rule, for Web site design. So while I think you are describing agent-driven CN (or something very similar), I also think it would be desirable to go ahead and get the full monty and define the appropriate Accept header and allow server-driven transparent CN. Agent-driven CN is still available for clients that wish to do so. I just don't understand the desire for server driven conneg when agent driven is the clear winner and has so many advantages; we'll have to agree to disagree on that; I think they are different modalities. Fair enough. I'm just offering you my advice based on my extensive experience in this space. You're free not to believe me, of course 8-) As long as you're also supporting agent driven conneg, I'm happy. - not needing to use the inconsistently-implemented Vary header, so there's no risk of cache pollution. see http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989 - more visible to search engines - simpler for developers, as it's just links returned by the server like the rest of their app. no need for new server side modules either I would suggest these are among the reasons we champion the 302 response + Location header approach (as opposed to 200/Content-Location) -- it makes the negotation more transparent Ah, I see. Yes, I agree that's a good design choice. You might also be interested to read this, where one of the RFC 2616 editors apologizes for supporting server driven conneg; http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html Note that he refers to HTTP conneg being broken, but is actually only talking about server driven conneg. I would counter with that fact that CN features prominently in: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ http://www.w3.org/TR/cooluris/ Given the role CN plays in these recent documents, it would seem CN has some measure of acceptance in the LOD community. Content negotiation is a valuable tool, so I'm glad there's interest, but IMO, both of those documents misrepresent it by only describing the server-driven form. Mark.
Re: RDF Update Feeds + URI time travel on HTTP-level
Danny Ayers wrote: What Damian said. I keep all my treasures in Subversion, it seems to work. 3rd that; whilst the http time travel conversation goes on - I can't help feeling that going down the date header route is only going to end up in something nobody uses; because it doesn't provide any implementation details to the developer, and thus nobody will adopt it. subversion/webdav/deltav on the other hand, everybody knows, it already works, does the trick and would be easy to implement - essentially all we're saying is let's version control rdf, a concept we can all understand, and at worst the addition of a http response version header tag would pretty much solve exposing all this functionality through http/rest etc. We could handle exposing diffs etc via restful post/get params (?since=r6) and also expose different synchronisation endpoints for data eg on a graph level or a resource level, or however a developer chooses to do it; the point is that simply specifying to use version control and one additional version response header will do the job. it's not perfect, it's not time travel; but it addresses the need in a familiar standards based way that's been thoroughly thought through and tested; and moreover it'll allow us all to get on and sync our RDF, now, rather than in 2 years when it's too late. all imho of course. the only thing I can see that remains is to determine the format / serialization of the updates, and primarily delete, we can take it for granted that all normal triples / quads are new - so all we need to do is find a way of saying X quad / triple has been removed. kinds regards, and naive as ever, nathan
Re: RDF Update Feeds + URI time travel on HTTP-level
Hi All, Apologies, feel like I'm wading in here - but non the less. issue is how to update / sync rdf; so here's another approach / thought. timestamp the predicate in a triple. thus you can query a graph as such: - uri ?p{date} ?o - ?s uri{date} ?o - ?s ?p{date} uri - ?s ?p{date} ?o and obviously you'd have historic data by nature as well. please do tell me the flaw in my thinking. many regards, nathan
Re: RDF Update Feeds + URI time travel on HTTP-level
Nathan wrote: timestamp the predicate in a triple. please do tell me the flaw in my thinking. scrap that, sorry for the noise, doesn't cater for indicating data has been removed however point remains that perhaps synchronisation (date or version) data should perhaps be in the RDF rather than outside the scope of rdf?
Re: RDF Update Feeds + URI time travel on HTTP-level
On Nov 25, 2009, at 3:51 AM, Nathan wrote: Danny Ayers wrote: What Damian said. I keep all my treasures in Subversion, it seems to work. 3rd that; whilst the http time travel conversation goes on - I can't help feeling that going down the date header route is only going to end up in something nobody uses; because it doesn't provide any implementation details to the developer, and thus nobody will adopt it. Nathan, Isn't it a bit early in the game to make such a statement? The research results from the memento project were just published in a paper, 2 weeks ago. Give us a little time and we'll have implementation guidelines up on the memento web site. And, as I indicated before, we have plans to write this up as an I-D = RFC. Cheers Herbert == Herbert Van de Sompel Digital Library Research Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
Re: RDF Update Feeds + URI time travel on HTTP-level
Herbert Van de Sompel wrote: On Nov 25, 2009, at 3:51 AM, Nathan wrote: Danny Ayers wrote: What Damian said. I keep all my treasures in Subversion, it seems to work. 3rd that; whilst the http time travel conversation goes on - I can't help feeling that going down the date header route is only going to end up in something nobody uses; because it doesn't provide any implementation details to the developer, and thus nobody will adopt it. Nathan, Isn't it a bit early in the game to make such a statement? The research results from the memento project were just published in a paper, 2 weeks ago. Give us a little time and we'll have implementation guidelines up on the memento web site. And, as I indicated before, we have plans to write this up as an I-D = RFC. certainly is, and as mentioned off list, the sincerest of apologies; I think memento is a fascinating idea, and something that definitely needs spec'd and hopefully implemented. feeling the stress of an impending deadline, and it just so happens that some form of rdf synchronisation is needed, and thus any involvement from me was on my own private agenda of getting something client passable working for next week; not the time to be sending off emails to mailing lists - think I'll be quiet till time is free again, unless I have something useful to contribute! many regards, nathan
Re: RDF Update Feeds + URI time travel on HTTP-level
Michael, On Wed, Nov 25, 2009 at 1:07 AM, Michael Nelson m...@cs.odu.edu wrote: What you describe is really close to what RFC 2616 calls Agent-driven Negotiation, which is how CN exists in the absence of Accept-* request headers. That's correct. But the TCN: Choice approach is introduced as an optimization. The idea is that if you know you prefer .en, .pdf and .gz then tell the server when making your original request and it will do its best to honor those requests. We think adding an orthogonal dimension for CN will be similar: if you know you prefer .en, .pdf, .gz and .20091031, then tell the server when making your original request and it will do its best to honor those requests. I understand. In practice, agent-driven CN is rarely done (I can only guess as to why). In practice, you get either server-driven (as defined in RFC 2616) or transparent CN (introduced in RFC 2616 (well, RFC 2068 actually), but really defined in RFCs 2295 2296). See: http://httpd.apache.org/docs/2.3/content-negotiation.html I disagree. I would say that agent-driven negotiation is by far the most common form of conneg in use today. Only it's not done through standardized means such as the Alternates header, but instead via language and format specific links embedded in HTML, e.g. Click here for the PDF version, or a Language/country-selector dropdown in the page header, or even via Javascript based selection. Server driven conneg, in comparison, is effectively unused. Ditto for transparent negotiation. So while I think you are describing agent-driven CN (or something very similar), I also think it would be desirable to go ahead and get the full monty and define the appropriate Accept header and allow server-driven transparent CN. Agent-driven CN is still available for clients that wish to do so. I just don't understand the desire for server driven conneg when agent driven is the clear winner and has so many advantages; - not needing to use the inconsistently-implemented Vary header, so there's no risk of cache pollution. see http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989 - more visible to search engines - simpler for developers, as it's just links returned by the server like the rest of their app. no need for new server side modules either You might also be interested to read this, where one of the RFC 2616 editors apologizes for supporting server driven conneg; http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html Note that he refers to HTTP conneg being broken, but is actually only talking about server driven conneg. I think that makes for a pretty strong case against it, and I haven't even elaborated on the architectural problems I perceive with it (though some of the advantages above relate closely). Mark.
Re: RDF Update Feeds + URI time travel on HTTP-level
Mark, In practice, agent-driven CN is rarely done (I can only guess as to why). In practice, you get either server-driven (as defined in RFC 2616) or transparent CN (introduced in RFC 2616 (well, RFC 2068 actually), but really defined in RFCs 2295 2296). See: http://httpd.apache.org/docs/2.3/content-negotiation.html I disagree. I would say that agent-driven negotiation is by far the most common form of conneg in use today. Only it's not done through standardized means such as the Alternates header, but instead via language and format specific links embedded in HTML, e.g. Click here for the PDF version, or a Language/country-selector dropdown in the page header, or even via Javascript based selection. While the exact line between them might be hard to draw, I'd argue those aren't HTTP-level events, but instead are HTML-level events. In other words, I would call those examples navigation. In addition, navigation works well for things that can be expressed in HTML wrappers (e.g., click here for the PDF version), but not really for embed/img tags where you want to choose between, say, .png .gif. Server driven conneg, in comparison, is effectively unused. Ditto for transparent negotiation. I think that is an unfair characterization. I won't guess as to how often it is done, but it is done. It is just not perceived by the user. Almost every browser sends out various Accept request headers, and it is not uncommon to have Vary and TCN: Choice response headers (check responses from w3c.org, for example). When done with the 200 response + Content-Location header, the URI that the browser displays does not change. Also, if you link directly to uncool URIs (e.g., foo.gif or bar.html), you won't see any traces of CN in the response because those URIs aren't subject to negotiation. So while I think you are describing agent-driven CN (or something very similar), I also think it would be desirable to go ahead and get the full monty and define the appropriate Accept header and allow server-driven transparent CN. Agent-driven CN is still available for clients that wish to do so. I just don't understand the desire for server driven conneg when agent driven is the clear winner and has so many advantages; we'll have to agree to disagree on that; I think they are different modalities. - not needing to use the inconsistently-implemented Vary header, so there's no risk of cache pollution. see http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989 - more visible to search engines - simpler for developers, as it's just links returned by the server like the rest of their app. no need for new server side modules either I would suggest these are among the reasons we champion the 302 response + Location header approach (as opposed to 200/Content-Location) -- it makes the negotation more transparent You might also be interested to read this, where one of the RFC 2616 editors apologizes for supporting server driven conneg; http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html Note that he refers to HTTP conneg being broken, but is actually only talking about server driven conneg. I would counter with that fact that CN features prominently in: http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ http://www.w3.org/TR/cooluris/ Given the role CN plays in these recent documents, it would seem CN has some measure of acceptance in the LOD community. regards, Michael I think that makes for a pretty strong case against it, and I haven't even elaborated on the architectural problems I perceive with it (though some of the advantages above relate closely). Mark. Michael L. Nelson m...@cs.odu.edu http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f)
Re: RDF Update Feeds
FWIW, I had a quick look at the current caching support in LOD datasets [1] - not very encouraging, to be honest. Cheers, Michael [1] http://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/ -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Michael Hausenblas michael.hausenb...@deri.org Date: Sat, 21 Nov 2009 11:19:18 + To: Hugh Glaser h...@ecs.soton.ac.uk, Georgi Kobilarov georgi.kobila...@gmx.de Cc: Linked Data community public-lod@w3.org Subject: Re: RDF Update Feeds Resent-From: Linked Data community public-lod@w3.org Resent-Date: Sat, 21 Nov 2009 11:19:57 + Georgi, Hugh, Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Ah, Georgi, I see. You seem to emphasise the quantitative side whereas I just seem to want to flag what kind of source it is. I agree that Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date should be available, however I think that having the information regular/irregular vs. how frequent the update should be made available as well. My main use case is motivated from the LOD application-writing area. I figured that I quite often have written code that essentially does the same: based on the type of data-source it either gets a live copy of the data or uses already local available data. Now, given that data set publisher would declare the characteristics of their dataset in terms of dynamics, one could write such a LOD cache quite easily, I guess, abstracting the necessary steps and hence offering a reusable solution. I'll follow-up on this one soon via a blog post with a concrete example. My main question would be: what do we gain if we explicitly represent these characteristics, compared to what HTTP provides in terms of caching [1]. One might want to argue that the 'built-in' features are sort of too fine granular and there is a need for a data-source-level solution. in our semantic sitemaps, and these suggestions seem very similar. Eg http://dotac.rkbexplorer.com/sitemap.xml (And I think these frequencies may correspond to normal sitemaps.) So a naïve approach, if you want RDF, would be to use something very similar (and simple). Of course I am probably known for my naivity, which is often misplaced. Hugh, of course you're right (as often ;). Technically, this sort of information ('changefreq') is available via sitemaps. Essentially, one could lift this to RDF straight-forward, if desired. If you look closely to what I propose, however, then you'll see that I aim at a sort of qualitative description which could drive my LOD cache (along with the other information I already have from the void:Dataset). Now, before I continue to argue here on a purely theoretical level, lemme implement a demo and come back once I have something to discuss ;) Cheers, Michael [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Hugh Glaser h...@ecs.soton.ac.uk Date: Fri, 20 Nov 2009 18:29:17 + To: Georgi Kobilarov georgi.kobila...@gmx.de, Michael Hausenblas michael.hausenb...@deri.org Cc: Linked Data community public-lod@w3.org Subject: Re: RDF Update Feeds Sorry if I have missed something, but... We currently put things like changefreqmonthly/changefreq changefreqdaily/changefreq changefreqnever/changefreq in our semantic sitemaps, and these suggestions seem very similar. Eg http://dotac.rkbexplorer.com/sitemap.xml (And I think these frequencies may correspond to normal sitemaps.) So a naïve approach, if you want RDF, would be to use something very similar (and simple). Of course I am probably known for my naivity, which is often misplaced. Best Hugh On 20/11/2009 17:47, Georgi Kobilarov georgi.kobila...@gmx.de wrote: Hi Michael, nice write-up on the wiki! But I think the vocabulary you're proposing is too much generally descriptive. Dataset publishers, once offering update feeds, should not only tell that/if their datasets are dynamic, but instead how dynamic they are. Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Makes sense? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -Original Message- From: Michael Hausenblas [mailto:michael.hausenb...@deri.org] Sent: Friday, November 20, 2009 4:01 PM To: Georgi Kobilarov Cc: Linked Data community Subject: Re: RDF Update
Re: RDF Update Feeds + URI time travel on HTTP-level
On Nov 23, 2009, at 9:02 PM, Herbert Van de Sompel wrote: On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote: At Mon, 23 Nov 2009 00:40:33 -0500, Mark Baker wrote: On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote: It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. You seem to have read more into my argument than was there, and created a strawman; I agree with the above. My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. I think this is a very compelling argument. Actually, I don't think it is. The issue was also brought up (in a significantly more tentative manner) in Pete Johnston blog entry on eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html ). Tomorrow, we will post a response that will try and show that current state issue is - as far as we can see - not quite as written in stone as suggested above in the specs that matter in this case, i.e. Architecture of the World Wide Web and RFC 2616. Both are interestingly vague about this. Just to let you know that our response to some issues re Memento raised here and on Pete Johnston's blog post (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html ) is now available at: http://www.cs.odu.edu/~mln/memento/response-2009-11-24.html We have also submitted this as an inline Comment to Pete's blog, but Comments require approval and that has not happened yet. Greetings Herbert Van de Sompel == Herbert Van de Sompel Digital Library Research Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
Re: RDF Update Feeds + URI time travel on HTTP-level
Herbert, On Tue, Nov 24, 2009 at 6:10 PM, Herbert Van de Sompel hvds...@gmail.com wrote: Just to let you know that our response to some issues re Memento raised here and on Pete Johnston's blog post (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html) is now available at: http://www.cs.odu.edu/~mln/memento/response-2009-11-24.html Regarding the suggestion to use the Link header, I was thinking the same thing. But the way you describe it being used is different than how I would suggest it be used. Instead of providing a link to each available representation, the server would just provide a single link to the timegate. The client could then GET the timegate URI and find either the list of URIs (along with date metadata), or some kind of form-like declaration that would permit it to specify the date/time for which it desires a representation (e.g. Open Search). Perhaps this is what you meant by timemap, I can't tell, though I don't see a need for the use of the Accept header in that case if the client can either choose or construct a URI for the desired archived representation. As for the current state issue, you're right that it isn't a general constraint of Web architecture. I was assuming we were talking only about the origin server. Of course, any Web component can be asked for a representation of any resource, and they are free to answer those requests in whatever way suits their purpose, including providing historical versions. Mark.
Re: RDF Update Feeds + URI time travel on HTTP-level
Good man, I couldn't help thinking there was a paper in that... 2009/11/22 Herbert Van de Sompel hvds...@gmail.com: hi all, (thanks Chris, Richard, Danny) In light of the current discussion, I would like to provide some clarifications regarding Memento: Time Travel for the Web, ie the idea of introducing HTTP content negotiation in the datetime dimension: (*) Some extra pointers: - For those who prefer browsing slides over reading a paper, there is http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web - Around mid next week, a video recording of a presentation I gave on Memento should be available at http://www.oclc.org/research/dss/default.htm - The Memento site is at http://www.mementoweb.org. Of special interest may be the proposed HTTP interactions for (a) web servers with internal archival capabilities such as content management systems, version control systems, etc (http://www.mementoweb.org/guide/http/local/) and (b) web servers without internal archival capabilities (http://www.mementoweb.org/guide/http/remote/). (*) The overall motivation for the work is the integration of archived resources into regular web navigation by making them available via their original URIs. The archived resources we have focused on in our experiments so far are those kept by (a) Web Archives such as the Internet Archive, Webcite, archive-it.org and (b) Content Management Systems such as wikis, CVS, ... The reason I pinged Chris Bizer about our work is that we thought that our proposed approach could also be of interest in the LoD environment. Specifically, the ability to get to prior descriptions of LoD resources by doing datetime content negotiation on their URI seemed appealing; e.g. what was the dbpedia description for the City of Paris on March 20 2008? This ability would, for example, allow analysis of (the evolution of ) data over time. The requirement that is currently being discussed in this thread (which I interpret to be about approaches to selectively get updates for a certain LoD database) is not one I had considered using Memento for, thinking this was more in the realm of feed technologies such as Atom (as suggested by Ed Summers), or the pre-REST OAI-PMH (http://www.openarchives.org/OAI/openarchivesprotocol.html). (*) Regarding some issues that were brought up in the discussion so far: - We use an X header because that seems to be best practice when doing experimental work. We would very much like to eventually migrate to a real header, e.g. Accept-Datetime. - We are definitely considering and interested in some way to formalize our proposal in a specification document. We felt that the I-D/RFC path would have been the appropriate one, but are obviously open to other approaches. - As suggested by Richard, there is a bootstrapping problem, as there is with many new paradigms that are introduced. I trust LoD developers fully understand this problem. Actually, the problem is not only at the browser level but also at the server level. We are currently working on a FireFox plug-in that, when ready, will be available through the regular channels. And we have successfully (and experimentally) modified the Mozilla code itself to be able to demonstrate the approach. We are very interested in getting support in other browsers, natively or via plug-ins. We also have some tools available to help with initial deployment (http://www.mementoweb.org/tools/ ). One is a plug-in for the mediawiki platform; when installed the wiki natively supports datetime content negotiation and redirects a client to the history page that was active at the datetime requested in the X-Accept-Header. We just started a Google group for developers interested in making Memento happen for their web servers, content management system, etc. (http://groups.google.com/group/memento-dev/). (*) Note that the proposed solution also leverages the OAI-ORE specification (fully compliant with LoD best practice) as a mechanism to support discovery of archived resources. I hope this helps to get a better understanding of what Memento is about, and what its current status is. Let me end by stating that we would very much like to get these ideas broadly adopted. And we understand we will need a lot of help to make that happen. Cheers Herbert == Herbert Van de Sompel Digital Library Research Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267 -- http://danny.ayers.name
Re: RDF Update Feeds + URI time travel on HTTP-level
2009/11/25 Michael Nelson m...@cs.odu.edu: In practice, agent-driven CN is rarely done (I can only guess as to why). In practice, you get either server-driven (as defined in RFC 2616) or transparent CN (introduced in RFC 2616 (well, RFC 2068 actually), but really defined in RFCs 2295 2296). See: http://httpd.apache.org/docs/2.3/content-negotiation.html My guess is that it relies on users making decisions that they aren't generally qualified, or concerned enough, to make. Considering language is basically a constant from the users operating system configuration, and format differences do not affect users enough to warrant giving them a choice between XHTML and HTML, or JPG and PNG, for example. I think browser designers see CN as a good thing for them, but basically irrelevant to users, and hence they decide it is easiest to just automate the process using server or transparent negotiation. Similar reasoning about why Apache goes so far to try to break down, what are likely unintentional mix-ups with equal q/qs value combinations, as it reduces confusion the user. The fact that the server and transparent CN processes rely on servers for part of the decision (qs), makes it perfectly fine for them to make the tie breaker decision in my opinion. There is basically no reason why the choice the server makes will be inconvienient for users as they already said that both formats or languages were acceptable in some way through the Accept- headers. Combined with the servers knowledge, the tie breaker will only choose one slightly better format compared to another decent format, resulting in a win-win scenario according to the users declared preferences. As long as the server sends back the real Content-Type it chose I am happy. Cheers, Peter
Re: RDF Update Feeds + URI time travel on HTTP-level
On Mon, Nov 23, 2009 at 1:01 AM, Peter Ansell ansell.pe...@gmail.com wrote: The issue with requiring people to direct requests at the URI for the Resource X at time T is that the circular linking issue I described previously comes into play because people need to pre-engineer their URI's to be compatible with a temporal dimension. I would recommend the use of a query parameter. If the user didn't know exactly what time scales were used by the server they would either need to follow a roughly drawn up convention, such as //MM/DD/meaningfulresourcename, or they would have to find an index somewhere, neither of which are as promising for the future of the web as having the ability to add another header to provide the desired behaviour IMO. I'm not sure what criteria you're basing that evaluation on, but IME it's far simpler to deploy a new relation type than a new HTTP header. Headers are largely opaque to Web developers. The documentation of the Vary header [1] seems to leave the situation open as to whether the server needs to be concerned about which or any Headers dictate which resource representation is to be returned. Caching in the context of HTTP/1.1 may have been designed to temporary, but I see no particular reason why a temporal Accept-* header, together with the possibility of its addition to Vary, couldn't be used on the absolute time dimension. It seems much cleaner than adding an extra command to HTTP, or requiring some other non-HTTP mechanism altogether. The extra header would never stop a server from returning the current version if it doesn't recognise the header, or it doesn't keep a version history, so it should be completely backwards compatible. Yes, Vary should, in theory, be used for this purpose. Unfortunately, in practice, due to a bug in IE, it has the effect of disabling caching in the browser and so you don't see it used very much, at least not for browser based applications; http://www.ilikespam.com/blog/internet-explorer-meets-the-vary-header Mark.
Re: RDF Update Feeds + URI time travel on HTTP-level
At Mon, 23 Nov 2009 00:40:33 -0500, Mark Baker wrote: On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote: It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. You seem to have read more into my argument than was there, and created a strawman; I agree with the above. My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. I think this is a very compelling argument. On the other hand, there is, nothing I can see that prevents one URI from representing another URI as it changes through time. This is already the case with, e.g., http://web.archive.org/web/*/http://example.org, which represents the URI http://example.org at all times. So this URI could, perhaps, be a target for X-Accept-Datetime headers. There is something else that I find problematic about the Memento proposal. Archival versions of a web page are too important to hide inside HTTP headers. To take the canonical example, if I am viewing http://oakland.example.org/weather, I don’t want the fact that I am viewing historical weather information to be hidden in the request headers. Furthermore, I am viewing resource X as it appeared at time T1, I should *not* be able to copy that URI and send it to a friend, or use it as a reference in a document, only to have them see the URI as it appears at time T2. I think that those of us in the web archiving community [1] would very much appreciate a serious look by the web architecture community into the problem of web archiving. The problem of representing and resolving the tuple URI, time is a question which has not yet been adequately dealt with. best, Erik Hetzner 1. Those unfamiliar with web archives are encouraged to visit http://web.archive.org/, http://www.archive-it.org/, http://www.vefsafn.is/, http://webarchives.cdlib.org/, ... ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpxkRUqLltSH.pgp Description: PGP signature
Re: RDF Update Feeds + URI time travel on HTTP-level
2009/11/24 Erik Hetzner erik.hetz...@ucop.edu: At Mon, 23 Nov 2009 00:40:33 -0500, Mark Baker wrote: On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote: It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. You seem to have read more into my argument than was there, and created a strawman; I agree with the above. My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. I think this is a very compelling argument. On the other hand, there is, nothing I can see that prevents one URI from representing another URI as it changes through time. This is already the case with, e.g., http://web.archive.org/web/*/http://example.org, which represents the URI http://example.org at all times. So this URI could, perhaps, be a target for X-Accept-Datetime headers. This is still a different URI though, and requires you to know that web.archive.org exists and that it has infact trawled example.org. There is something else that I find problematic about the Memento proposal. Archival versions of a web page are too important to hide inside HTTP headers. The clean aspect of using headers is that you don't have to munge the URI or attach it to the path of another URI in order to make the process work. To take the canonical example, if I am viewing http://oakland.example.org/weather, I don’t want the fact that I am viewing historical weather information to be hidden in the request headers. The user-agent could help here. Furthermore, I am viewing resource X as it appeared at time T1, I should *not* be able to copy that URI and send it to a friend, or use it as a reference in a document, only to have them see the URI as it appears at time T2. Current web citation methods typically require that you put Accessed on DD MM YY next to the URI if you want to publish it. If you were viewing it at T1 and that wasn't the current version then your user-agent would need to let you know that you were not viewing the most up to date copy of the resource. I think that those of us in the web archiving community [1] would very much appreciate a serious look by the web architecture community into the problem of web archiving. The problem of representing and resolving the tuple URI, time is a question which has not yet been adequately dealt with. It would still be nice to solve the issue in general so that we don't have to rely on archiving services in order to get past versions if you could do it by negotiating directly with the original server. Cheers, Peter
Re: RDF Update Feeds + URI time travel on HTTP-level
At Tue, 24 Nov 2009 10:14:01 +1000, Peter Ansell wrote: 2009/11/24 Erik Hetzner erik.hetz...@ucop.edu: […] On the other hand, there is, nothing I can see that prevents one URI from representing another URI as it changes through time. This is already the case with, e.g., http://web.archive.org/web/*/http://example.org, which represents the URI http://example.org at all times. So this URI could, perhaps, be a target for X-Accept-Datetime headers. This is still a different URI though, and requires you to know that web.archive.org exists and that it has infact trawled example.org. I agree. I was trying to suggest that, while I agree with Mark Baker that: all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. there could conceivably be a resource, e.g., http://web.archive.org/web/*/http://example.org/, whose representation could vary based on HTTP headers because it represents all versions of another resource http://example.org/ as that other resource varied across time. The clean aspect of using headers is that you don't have to munge the URI or attach it to the path of another URI in order to make the process work. I agree that it is nice to be able to not munge URIs to get archival content. Rewriting URIs for archived web content is a very difficult task which is prone to error, and if a user is browsing a web archive they often end up with ‘live’ (unarchived) web content in embeds, etc. instead of the archived content. But if the tradeoff for not munging URIs is to hide the archival nature of a resource in the HTTP headers I don’t think it is worth it. To take the canonical example, if I am viewing http://oakland.example.org/weather, I don’t want the fact that I am viewing historical weather information to be hidden in the request headers. The user-agent could help here. Perhaps it could, but I don’t think overloading the meaning of the resource that currently represents the current weather with historical weather data is a good idea. Current web citation methods typically require that you put Accessed on DD MM YY next to the URI if you want to publish it. If you were viewing it at T1 and that wasn't the current version then your user-agent would need to let you know that you were not viewing the most up to date copy of the resource. I would prefer to move away from current web citation methods. These methods provide no way for an author to ensure that (as much as possible) a reader will encounter the same text that the author read, and they provide no way for the typical reader to find the text as it was read by the author. If we are enhancing user agents and requiring user interaction, why not enhance a user agent with a feature that, given resource X at the current time T, directs a user to a new URI which uniquely identifies resource X at time T, a URI that can be copied pasted as a whole into a document. Then the author can be reasonably assured that a reader will be viewing the same content the author viewed. I think that those of us in the web archiving community [1] would very much appreciate a serious look by the web architecture community into the problem of web archiving. The problem of representing and resolving the tuple URI, time is a question which has not yet been adequately dealt with. It would still be nice to solve the issue in general so that we don't have to rely on archiving services in order to get past versions if you could do it by negotiating directly with the original server. Agreed! Furthermore, it would be nice to solve the problem in such a way that: a) the server could provide the past version; b) failing that, web archive A could provide the past version; c) failing that, web archive B could provide the past version; d) and so on. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpLNlkkEantB.pgp Description: PGP signature
Re: RDF Update Feeds + URI time travel on HTTP-level
On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote: At Mon, 23 Nov 2009 00:40:33 -0500, Mark Baker wrote: On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote: It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. You seem to have read more into my argument than was there, and created a strawman; I agree with the above. My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. I think this is a very compelling argument. Actually, I don't think it is. The issue was also brought up (in a significantly more tentative manner) in Pete Johnston blog entry on eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html ). Tomorrow, we will post a response that will try and show that current state issue is - as far as we can see - not quite as written in stone as suggested above in the specs that matter in this case, i.e. Architecture of the World Wide Web and RFC 2616. Both are interestingly vague about this. On the other hand, there is, nothing I can see that prevents one URI from representing another URI as it changes through time. This is already the case with, e.g., http://web.archive.org/web/*/http://example.org, which represents the URI http://example.org at all times. So this URI could, perhaps, be a target for X-Accept-Datetime headers. That is actually what we do in Memento (see our paper http://arxiv.org/abs/0911.1112) , and we recognize two cases, here: (1) If the web server does not keep track of its own archival versions, then we must rely on archival versions that are stored elsewhere, i.e. in Web Archives. In this case, the original server who receives the request can redirect the client to a resource like the one you mention above, i.e. a resource that stands for archived versions of another resource. Note that this redirect is a simple redirect like the ones that happen all the time on the Web. This is not a redirect that is part of a datetime content negotiation flow, rather a redirect that occurs because the server has detected an X- Accept-Datetime header. Now, we don't want to overload the existing http://web.archive.org/web/*/http://example.org as you suggest, but rather choose to introduce a special-purpose resource that we call a TimeGate http://web.archive.org/web/timegate/http://example.org . And we indeed introduce this resource as a target for datetime content negotiation. (2) If the web server does keep track of its own archival versions (think CMS), then it can handle requests for old versions locally as it has all the information that is required to do so. In this case, we could also introduce a special-purpose, distinct, TimeGate on this server, and have the original resource redirect to it. That would make this case in essence the same as (1) above. This, however, seemed like a bit of overkill and we felt that the original resource and the Timegate could coincide; meaning datetime content negotiation occurs directly against the original resource. Meaning the URI that represents the resource as it evolves over time is the URI of the resource itself. It stands for past and present versions. The present version is delivered (200 OK) from that URI itself (business as usual), archived versions are delivered from other resources via content negotiation (302 with Location different than the original URI) In In both (1) and (2) the original resource plays a role in the framework, either because it redirects to an external TimeGate that performs the datetime content negotiation, or because it performs the datetime content negotiation itself. And we actually think that is quite essential that this original resource is involved. It is the URI of the original resource by which the resource has been known as it evolved over time. It makes sense to be able to use that URI to try and get to its past versions. And by get, I don't mean search for it, but rather use the network to get there. After all, we all go by the same name irrespective of the day you talk to us. Or we have the same Linked Data URI irrespective of the day it is dereferenced. Why would we suddenly need a new URI when we want to see what the LoD description for any of us was, say, a year ago? Why must we prevent that this same URI helps us to get to prior
Re: RDF Update Feeds + URI time travel on HTTP-level
2009/11/22 Richard Cyganiak rich...@cyganiak.de: On 20 Nov 2009, at 19:07, Chris Bizer wrote: [snips] From a web architecture POV it seems pretty solid to me. Doing stuff via headers is considered bad if you could just as well do it via links and additional URIs, but you can argue that the time dimension is such a universal thing that a header-based solution is warranted. Sounds good to me too, but x-headers are a jump, I think perhaps it's a question worthy of throwing at the W3C TAG - pretty sure they've looked at similar stuff in the past, but things are changing fast... From what I can gather, proper diffs over time are hard (long before you get to them logics). But Web-like diffs don't have to be - can't be any less reliable than my online credit card statement. Bit worrying there are so many different approaches available, sounds like there could be a lot of coding time wasted. But then again, might well be one for evolution - and in the virtual world trying stuff out is usually worth it. The main drawback IMO is that existing clients, such as all web browsers, will be unable to access the archived versions, because they don't know about the header. If you are archiving web pages or RDF document, then you could add links that lead clients to the archived versions, but that won't work for images, PDFs and so forth. Hmm. For one, browsers are in flux, for two then you probably wouldn't expect that kind of agent to give you anything but the latest. If I need last years version, I follow my nose through URIs (as in svn etc) - that kind of thing has to be a fallback, imho. In summary, I think it's pretty cool. Cool idea, for sure. It is something strong...ok, temporal stuff should be available down at quite a low level, especially given that things like xmpp will be bouncing around - but I reckon Richard's right in suggesting the plain old URI thing will currently serve most purposes. Cheers, Danny. -- http://danny.ayers.name
Re: RDF Update Feeds + URI time travel on HTTP-level
On 22 Nov 2009, at 09:39, Danny Ayers wrote: 2009/11/22 Richard Cyganiak rich...@cyganiak.de: On 20 Nov 2009, at 19:07, Chris Bizer wrote: [snips] From a web architecture POV it seems pretty solid to me. Doing stuff via headers is considered bad if you could just as well do it via links and additional URIs, but you can argue that the time dimension is such a universal thing that a header-based solution is warranted. Sounds good to me too, but x-headers are a jump, I think perhaps it's a question worthy of throwing at the W3C TAG - pretty sure they've looked at similar stuff in the past, but things are changing fast... See also http://tools.ietf.org/html/rfc3253 Subversion is a partial deltav implementation. It may well be the only deployed implementation. Damian
Re: RDF Update Feeds + URI time travel on HTTP-level
Damian Steer wrote: On 22 Nov 2009, at 09:39, Danny Ayers wrote: 2009/11/22 Richard Cyganiak rich...@cyganiak.de: On 20 Nov 2009, at 19:07, Chris Bizer wrote: [snips] From a web architecture POV it seems pretty solid to me. Doing stuff via headers is considered bad if you could just as well do it via links and additional URIs, but you can argue that the time dimension is such a universal thing that a header-based solution is warranted. Sounds good to me too, but x-headers are a jump, I think perhaps it's a question worthy of throwing at the W3C TAG - pretty sure they've looked at similar stuff in the past, but things are changing fast... See also http://tools.ietf.org/html/rfc3253 Subversion is a partial deltav implementation. It may well be the only deployed implementation. surely virtuoso webdav w/ ods breifcase can be classed as a deployed implementation; unsure of status re forking etc but most of it's there and functioning v well. nathan
Re: RDF Update Feeds + URI time travel on HTTP-level
Nathan wrote: Damian Steer wrote: On 22 Nov 2009, at 09:39, Danny Ayers wrote: 2009/11/22 Richard Cyganiak rich...@cyganiak.de: On 20 Nov 2009, at 19:07, Chris Bizer wrote: [snips] From a web architecture POV it seems pretty solid to me. Doing stuff via headers is considered bad if you could just as well do it via links and additional URIs, but you can argue that the time dimension is such a universal thing that a header-based solution is warranted. Sounds good to me too, but x-headers are a jump, I think perhaps it's a question worthy of throwing at the W3C TAG - pretty sure they've looked at similar stuff in the past, but things are changing fast... See also http://tools.ietf.org/html/rfc3253 Subversion is a partial deltav implementation. It may well be the only deployed implementation. surely virtuoso webdav w/ ods breifcase can be classed as a deployed implementation; unsure of status re forking etc but most of it's there and functioning v well. nathan Nathan, Yes, but as usual we prefer to wait for some kind of consensus, and then we just put the relevant aspect of Virtuoso into play. In a nutshell, this is why we committed to industry standards from the get-go, since doing so reduces this kind of work to functionality orchestration :-) WebDAV, Atom Pub, GData etc.. have all existed inside Virtuoso for a long time, but on their own the net effect has sometimes been confusion (due to value pyramid inversion on the part of its beholders). We also see XMPP (which you've alluded to recently, and bugged about by Danbri for sometime) and XMPP++ (Google Wave) as interesting. Ditto PubSubHubBub etc.. Also note, replication and synchronization e.g., via transaction logs (in the most sophisticated cases) is something Virtuoso has handled across SQL DBMS engines that provide API access to transaction logs for eons, so this is all very familiar territory. I still remember confusion, at advent of blogging, when we indicated the existence of Atom and RSS aggregation and indexing support inside Virtuoso (sure you can Google up on that) :-) Giovanni: why isn't the RDFsync protocol (from yourself and Orri) part of this conversation? My silence during this conversation has been deliberate :-) -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: RDF Update Feeds + URI time travel on HTTP-level
hi all, (thanks Chris, Richard, Danny) In light of the current discussion, I would like to provide some clarifications regarding Memento: Time Travel for the Web, ie the idea of introducing HTTP content negotiation in the datetime dimension: (*) Some extra pointers: - For those who prefer browsing slides over reading a paper, there is http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web - Around mid next week, a video recording of a presentation I gave on Memento should be available at http://www.oclc.org/research/dss/default.htm - The Memento site is at http://www.mementoweb.org. Of special interest may be the proposed HTTP interactions for (a) web servers with internal archival capabilities such as content management systems, version control systems, etc (http://www.mementoweb.org/guide/http/local/ ) and (b) web servers without internal archival capabilities (http://www.mementoweb.org/guide/http/remote/ ). (*) The overall motivation for the work is the integration of archived resources into regular web navigation by making them available via their original URIs. The archived resources we have focused on in our experiments so far are those kept by (a) Web Archives such as the Internet Archive, Webcite, archive-it.org and (b) Content Management Systems such as wikis, CVS, ... The reason I pinged Chris Bizer about our work is that we thought that our proposed approach could also be of interest in the LoD environment. Specifically, the ability to get to prior descriptions of LoD resources by doing datetime content negotiation on their URI seemed appealing; e.g. what was the dbpedia description for the City of Paris on March 20 2008? This ability would, for example, allow analysis of (the evolution of ) data over time. The requirement that is currently being discussed in this thread (which I interpret to be about approaches to selectively get updates for a certain LoD database) is not one I had considered using Memento for, thinking this was more in the realm of feed technologies such as Atom (as suggested by Ed Summers), or the pre-REST OAI-PMH (http://www.openarchives.org/OAI/openarchivesprotocol.html ). (*) Regarding some issues that were brought up in the discussion so far: - We use an X header because that seems to be best practice when doing experimental work. We would very much like to eventually migrate to a real header, e.g. Accept-Datetime. - We are definitely considering and interested in some way to formalize our proposal in a specification document. We felt that the I- D/RFC path would have been the appropriate one, but are obviously open to other approaches. - As suggested by Richard, there is a bootstrapping problem, as there is with many new paradigms that are introduced. I trust LoD developers fully understand this problem. Actually, the problem is not only at the browser level but also at the server level. We are currently working on a FireFox plug-in that, when ready, will be available through the regular channels. And we have successfully (and experimentally) modified the Mozilla code itself to be able to demonstrate the approach. We are very interested in getting support in other browsers, natively or via plug-ins. We also have some tools available to help with initial deployment (http://www.mementoweb.org/tools/ ). One is a plug-in for the mediawiki platform; when installed the wiki natively supports datetime content negotiation and redirects a client to the history page that was active at the datetime requested in the X-Accept-Header. We just started a Google group for developers interested in making Memento happen for their web servers, content management system, etc. (http://groups.google.com/group/memento-dev/). (*) Note that the proposed solution also leverages the OAI-ORE specification (fully compliant with LoD best practice) as a mechanism to support discovery of archived resources. I hope this helps to get a better understanding of what Memento is about, and what its current status is. Let me end by stating that we would very much like to get these ideas broadly adopted. And we understand we will need a lot of help to make that happen. Cheers Herbert == Herbert Van de Sompel Digital Library Research Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
Re: RDF Update Feeds + URI time travel on HTTP-level
[tried to send this before but somehow did not get through to list] hi all, (thanks Chris, Richard, Danny) In light of the current discussion, I would like to provide some clarifications regarding Memento: Time Travel for the Web, ie the idea of introducing HTTP content negotiation in the datetime dimension: (*) Some extra pointers: - For those who prefer browsing slides over reading a paper, there is http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web - Around mid next week, a video recording of a presentation I gave on Memento should be available at http://www.oclc.org/research/dss/default.htm - The Memento site is at http://www.mementoweb.org. Of special interest may be the proposed HTTP interactions for (a) web servers with internal archival capabilities such as content management systems, version control systems, etc (http://www.mementoweb.org/guide/http/local/ ) and (b) web servers without internal archival capabilities (http://www.mementoweb.org/guide/http/remote/ ). (*) The overall motivation for the work is the integration of archived resources into regular web navigation by making them available via their original URIs. The archived resources we have focused on in our experiments so far are those kept by: (a) Web Archives such as the Internet Archive, Webcite, archive-it.org and (b) Content Management Systems such as wikis, CVS, ... The reason I pinged Chris Bizer about our work is that we thought that our proposed approach could also be of interest in the LoD environment. Specifically, the ability to get to prior descriptions of LoD resources by doing datetime content negotiation on their URI seemed appealing; e.g. what was the dbpedia description for the City of Paris on March 20 2008? This ability would, for example, allow analysis of (the evolution of ) data over time. The requirement that is currently being discussed in this thread (which I interpret to be about approaches to selectively get updates for a certain LoD database) is not one I had considered using Memento for, thinking this was more in the realm of feed technologies such as Atom (as suggested by Ed Summers), or the pre-REST OAI-PMH (http://www.openarchives.org/OAI/openarchivesprotocol.html ). (*) Regarding some issues that were brought up in the discussion so far: - We use an X header because that seems to be best practice when doing experimental work. We would very much like to eventually migrate to a real header, e.g. Accept-Datetime. - We are definitely considering and interested in some way to formalize our proposal in a specification document. We felt that the I- D/RFC path would have been the appropriate one, but are obviously open to other approaches. - As suggested by Richard, there is a bootstrapping problem, as there is with many new paradigms that are introduced. I trust LoD developers fully understand this problem. Actually, the problem is not only at the browser level but also at the server level. We are currently working on a FireFox plug-in that, when ready, will be available through the regular channels. And we have successfully (and experimentally) modified the Mozilla code itself to be able to demonstrate the approach. We are very interested in getting support in other browsers, natively or via plug-ins. We also have some tools available to help with initial deployment (http://www.mementoweb.org/tools/ ). One is a plug-in for the mediawiki platform; when installed the wiki natively supports datetime content negotiation and redirects a client to the history page that was active at the datetime requested in the X-Accept-Header. We just started a Google group for developers interested in making Memento happen for their web servers, content management system, etc. (http://groups.google.com/group/memento-dev/). (*) Note that the proposed solution also leverages the OAI-ORE specification (fully compliant with LoD best practice) as a mechanism to support discovery of archived resources. I hope this helps to get a better understanding of what Memento is about, and what its current status is. Let me end by stating that we would very much like to get these ideas broadly adopted. And we understand we will need a lot of help to make that happen. Cheers Herbert On Nov 22, 2009, at 2:39 AM, Danny Ayers wrote: 2009/11/22 Richard Cyganiak rich...@cyganiak.de: On 20 Nov 2009, at 19:07, Chris Bizer wrote: [snips] From a web architecture POV it seems pretty solid to me. Doing stuff via headers is considered bad if you could just as well do it via links and additional URIs, but you can argue that the time dimension is such a universal thing that a header-based solution is warranted. Sounds good to me too, but x-headers are a jump, I think perhaps it's a question worthy of throwing at the W3C TAG - pretty sure they've looked at similar stuff in the past, but things are changing
Re: RDF Update Feeds + URI time travel on HTTP-level
Hi Chris, On Fri, Nov 20, 2009 at 1:07 PM, Chris Bizer ch...@bizer.de wrote: Hi Michael, Georgi and all, just to complete the list of proposals, here another one from Herbert Van de Sompel from the Open Archives Initiative. Memento: Time Travel for the Web http://arxiv.org/abs/0911.1112 The idea of Memento is to use HTTP content negotiation in the datetime dimension. By using a newly introduced X-Accept-Datetime HTTP header they add a temporal dimension to URIs. The result is a framework in which archived resources can seamlessly be reached via the URI of their original. Sounds cool to me. Anybody an opinion whether this violates general Web architecture somewhere? IMO, it does. The problem is that an HTTP request with the Accept-Datetime header is logically targeting a different resource than the one identified in the Request-URI. Accept-* headers are for negotiating the selection of resource *representations*, not resources. Resource selection should always be handled via hypermedia. Mark.
Re: RDF Update Feeds + URI time travel on HTTP-level
2009/11/23 Mark Baker dist...@acm.org: Hi Chris, On Fri, Nov 20, 2009 at 1:07 PM, Chris Bizer ch...@bizer.de wrote: Hi Michael, Georgi and all, just to complete the list of proposals, here another one from Herbert Van de Sompel from the Open Archives Initiative. Memento: Time Travel for the Web http://arxiv.org/abs/0911.1112 The idea of Memento is to use HTTP content negotiation in the datetime dimension. By using a newly introduced X-Accept-Datetime HTTP header they add a temporal dimension to URIs. The result is a framework in which archived resources can seamlessly be reached via the URI of their original. Sounds cool to me. Anybody an opinion whether this violates general Web architecture somewhere? IMO, it does. The problem is that an HTTP request with the Accept-Datetime header is logically targeting a different resource than the one identified in the Request-URI. Accept-* headers are for negotiating the selection of resource *representations*, not resources. Resource selection should always be handled via hypermedia. I think it general it is likely to target a different representation of the same resource, just in the time dimension rather than in the spatial format dimensions that Accept headers currently negotiate with. Arguing that a resource is not different if it has non-equal binary representations in the format dimension at a particular point in time, is no different IMO to arguing that the nature of the resource has not changed because of one or more intentional non-nature affecting change in one of the binary representations through time. The use of language as an accept header allows people to select between representations that do not necessarily contain the same information, as the translation might not be complete, or there may be semantic ambiguity that makes it impossible to reliably translate back and forth between the documents without some information loss. If it is consensus that the time dimension is always a special case where the nature of a resource actually changes if the bits ever change, then I think it would be more appropriate to use different identifying features such as locators to retrieve the thing, but currently I think the case is not very convincing given the current documentation of Accept possibilities. In a non-RDF example, one might want to examine the changes in the the resolution of an image that may have been improved overtime as image resolution algorithms improve. IMO, a more recent document would be the same image, just with more detail. Arguing that the exact dimensions and bit representation of the image have changed, but not the resource, would be currently accepted if the file format changed because new Accept possibilities can be added without changing the nature of the web resource. However, if the file format didn't change, currently we are not sure, but it seems as though it should be treated a new image resource. This is a contradiction IMO because we have already said that the bit representation can be non-identical and the resulting representations can still identify the same resource based on the use of Accept headers. In a semi-serious example, if the resource is strictly different every time something changes, there would be a never ending circle of updates necessary if two or more documents started out unlinked, but wanted to link to the other documents in the strictest manner possible. If semi-constant identifiers are not allowed, every time a document was updated, the new document would receive a new identifier which would require both an update to the other document if the owners of that document wanted their users to have a link to a document that linked back to them. This update would require a resource locator change, which would then allow the other document producer to update both the link and the resource URI to keep its users up to date. In my opinion it is a very good thing to allow locators to stay semi-constant, as the web architecture documentation might be reasonably thought to represent the real web in some way, which it would not do if this example were taken seriously. It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. Cheers, Peter
Re: RDF Update Feeds + URI time travel on HTTP-level
On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote: It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. You seem to have read more into my argument than was there, and created a strawman; I agree with the above. My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. Mark.
Re: RDF Update Feeds + URI time travel on HTTP-level
2009/11/23 Mark Baker dist...@acm.org: On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote: It should be up to resource creators to determine when the nature of a resource changes across time. A web architecture that requires every single edit to have a different identifier is a large hassle and likely won't catch on if people find that they can work fine with a system that evolves constantly using semi-constant identifiers, rather than through a series of mandatory time based checkpoints. You seem to have read more into my argument than was there, and created a strawman; I agree with the above. I did take some Literary privilege. The strawman was intended to be knocked down in the same argument. My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of Resource X at time T needs to be directed at the URI for Resource X at time T, not Resource X. The issue with requiring people to direct requests at the URI for the Resource X at time T is that the circular linking issue I described previously comes into play because people need to pre-engineer their URI's to be compatible with a temporal dimension. If the user didn't know exactly what time scales were used by the server they would either need to follow a roughly drawn up convention, such as //MM/DD/meaningfulresourcename, or they would have to find an index somewhere, neither of which are as promising for the future of the web as having the ability to add another header to provide the desired behaviour IMO. The documentation of the Vary header [1] seems to leave the situation open as to whether the server needs to be concerned about which or any Headers dictate which resource representation is to be returned. Caching in the context of HTTP/1.1 may have been designed to temporary, but I see no particular reason why a temporal Accept-* header, together with the possibility of its addition to Vary, couldn't be used on the absolute time dimension. It seems much cleaner than adding an extra command to HTTP, or requiring some other non-HTTP mechanism altogether. The extra header would never stop a server from returning the current version if it doesn't recognise the header, or it doesn't keep a version history, so it should be completely backwards compatible. Cheers, Peter [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44
Re: RDF Update Feeds
Georgi, Hugh, Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Ah, Georgi, I see. You seem to emphasise the quantitative side whereas I just seem to want to flag what kind of source it is. I agree that Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date should be available, however I think that having the information regular/irregular vs. how frequent the update should be made available as well. My main use case is motivated from the LOD application-writing area. I figured that I quite often have written code that essentially does the same: based on the type of data-source it either gets a live copy of the data or uses already local available data. Now, given that data set publisher would declare the characteristics of their dataset in terms of dynamics, one could write such a LOD cache quite easily, I guess, abstracting the necessary steps and hence offering a reusable solution. I'll follow-up on this one soon via a blog post with a concrete example. My main question would be: what do we gain if we explicitly represent these characteristics, compared to what HTTP provides in terms of caching [1]. One might want to argue that the 'built-in' features are sort of too fine granular and there is a need for a data-source-level solution. in our semantic sitemaps, and these suggestions seem very similar. Eg http://dotac.rkbexplorer.com/sitemap.xml (And I think these frequencies may correspond to normal sitemaps.) So a naïve approach, if you want RDF, would be to use something very similar (and simple). Of course I am probably known for my naivity, which is often misplaced. Hugh, of course you're right (as often ;). Technically, this sort of information ('changefreq') is available via sitemaps. Essentially, one could lift this to RDF straight-forward, if desired. If you look closely to what I propose, however, then you'll see that I aim at a sort of qualitative description which could drive my LOD cache (along with the other information I already have from the void:Dataset). Now, before I continue to argue here on a purely theoretical level, lemme implement a demo and come back once I have something to discuss ;) Cheers, Michael [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Hugh Glaser h...@ecs.soton.ac.uk Date: Fri, 20 Nov 2009 18:29:17 + To: Georgi Kobilarov georgi.kobila...@gmx.de, Michael Hausenblas michael.hausenb...@deri.org Cc: Linked Data community public-lod@w3.org Subject: Re: RDF Update Feeds Sorry if I have missed something, but... We currently put things like changefreqmonthly/changefreq changefreqdaily/changefreq changefreqnever/changefreq in our semantic sitemaps, and these suggestions seem very similar. Eg http://dotac.rkbexplorer.com/sitemap.xml (And I think these frequencies may correspond to normal sitemaps.) So a naïve approach, if you want RDF, would be to use something very similar (and simple). Of course I am probably known for my naivity, which is often misplaced. Best Hugh On 20/11/2009 17:47, Georgi Kobilarov georgi.kobila...@gmx.de wrote: Hi Michael, nice write-up on the wiki! But I think the vocabulary you're proposing is too much generally descriptive. Dataset publishers, once offering update feeds, should not only tell that/if their datasets are dynamic, but instead how dynamic they are. Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Makes sense? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -Original Message- From: Michael Hausenblas [mailto:michael.hausenb...@deri.org] Sent: Friday, November 20, 2009 4:01 PM To: Georgi Kobilarov Cc: Linked Data community Subject: Re: RDF Update Feeds Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Looking forward to hearing what you think ... Cheers, Michael [1] http://esw.w3.org/topic/DatasetDynamics -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Georgi Kobilarov georgi.kobila...@gmx.de Date: Tue, 17 Nov 2009 16:45:46 +0100 To: Linked Data community public-lod@w3.org Subject: RDF Update Feeds Resent-From: Linked Data
Re: RDF Update Feeds
2009/11/21 Michael Hausenblas michael.hausenb...@deri.org: Georgi, Hugh, Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Ah, Georgi, I see. You seem to emphasise the quantitative side whereas I just seem to want to flag what kind of source it is. I agree that Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date should be available, however I think that having the information regular/irregular vs. how frequent the update should be made available as well. My main use case is motivated from the LOD application-writing area. I figured that I quite often have written code that essentially does the same: based on the type of data-source it either gets a live copy of the data or uses already local available data. Now, given that data set publisher would declare the characteristics of their dataset in terms of dynamics, one could write such a LOD cache quite easily, I guess, abstracting the necessary steps and hence offering a reusable solution. I'll follow-up on this one soon via a blog post with a concrete example. If you want to do polling based on single resources at regular (ie, less than day), intervals then you are likely to flood the server just looking for potential updates in cases where the server really doesn't know how often a particular resources is going to be updated, such as DBpedia-live where the update rate is completely reliant on the amount of activity on Wikipedia which is likely to spike at certain times, and then even out and possibly drop off for months at a time. Using a change feed with clients polling once per period on a sliding window feed will break down whenever the temporary update rate is so fast that a full window on the feed passes before clients do consecutive polls on the update feed. There is no way to guarantee what the maximum update rate for DBpedia-live is, for example, so the published update rate would have to simply be as often as the server can handle based on the size of the RSS file required to publish information about which resources have been recently updated. The main reason that RSS isn't useful for consistency IMO is that it relies on clients updating very regularly or else they actually miss out permanently on information and the RSS reader application contains a limited set of what was really published on the feed. The mechanism that DBpedia-live uses to monitor Wikipedia might be a candidate, however it still suffers from issues with clients dropping out for periods of time and either missing updates or getting large spikes when they come back online. If clients do not receive the notifications for a day on DBpedia-live, could they possibly catch up without performing a DOS on the server trying to poll all of the announcements that they missed out on? If this is going to work and minimise bandwidth usage, there needs to be some mechanism to enable clients to check if information is newer than the cached information without any actual RDF information being transferred. Currently RDF databases don't support this, and it is particularly hard to support where the GRAPH used in the database is not meant to be a single document, such as http://dbpedia.org on DBpedia. Cheers, Peter
Re: RDF Update Feeds + URI time travel on HTTP-level
On 20 Nov 2009, at 19:07, Chris Bizer wrote: just to complete the list of proposals, here another one from Herbert Van de Sompel from the Open Archives Initiative. Memento: Time Travel for the Web http://arxiv.org/abs/0911.1112 The idea of Memento is to use HTTP content negotiation in the datetime dimension. By using a newly introduced X-Accept-Datetime HTTP header they add a temporal dimension to URIs. The result is a framework in which archived resources can seamlessly be reached via the URI of their original. Interesting! It seems to be most useful for “time travelling” on the web, and would allow me to browse the web as it was at some point in the past, similar to the Wayback Machine [1]. Unlike the Wayback Machine, it would work without a central archive, and only on those servers that implement the proposal, and only with a browser/client that supports the feature. I don't immediately see how this could be used to synchronize updates between datasets though. Being able to access past versions of URIs doesn't tell me what has changed throughout the site between then and today. Sounds cool to me. Anybody an opinion whether this violates general Web architecture somewhere? From a web architecture POV it seems pretty solid to me. Doing stuff via headers is considered bad if you could just as well do it via links and additional URIs, but you can argue that the time dimension is such a universal thing that a header-based solution is warranted. The main drawback IMO is that existing clients, such as all web browsers, will be unable to access the archived versions, because they don't know about the header. If you are archiving web pages or RDF document, then you could add links that lead clients to the archived versions, but that won't work for images, PDFs and so forth. In summary, I think it's pretty cool. Anyone who has used Apple's Time Machine would probably get a kick out of the idea of doing the same on a web page, zooming into the past on a a Wikipedia page or on Github or on a weather site. But if you're only interested in doing something for a single site, then an ad-hoc solution based on URIs for old versions is probably more practical. Best, Richard [1] http://www.archive.org/web/web.php Anybody aware of other proposals that work on HTTP-level? Have a nice weekend, Chris -Ursprüngliche Nachricht- Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag von Georgi Kobilarov Gesendet: Freitag, 20. November 2009 18:48 An: 'Michael Hausenblas' Cc: Linked Data community Betreff: RE: RDF Update Feeds Hi Michael, nice write-up on the wiki! But I think the vocabulary you're proposing is too much generally descriptive. Dataset publishers, once offering update feeds, should not only tell that/if their datasets are dynamic, but instead how dynamic they are. Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Makes sense? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -Original Message- From: Michael Hausenblas [mailto:michael.hausenb...@deri.org] Sent: Friday, November 20, 2009 4:01 PM To: Georgi Kobilarov Cc: Linked Data community Subject: Re: RDF Update Feeds Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Looking forward to hearing what you think ... Cheers, Michael [1] http://esw.w3.org/topic/DatasetDynamics -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Georgi Kobilarov georgi.kobila...@gmx.de Date: Tue, 17 Nov 2009 16:45:46 +0100 To: Linked Data community public-lod@w3.org Subject: RDF Update Feeds Resent-From: Linked Data community public-lod@w3.org Resent-Date: Tue, 17 Nov 2009 15:46:30 + Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
Hello! Back in April, we had a similar discussion: http://lists.w3.org/Archives/Public/public-lod/2009Apr/0130.html Concretely, we are having exactly the same problem for syncing up aggregations of BBC RDF data (Talis's and OpenLink's), as our data changes *a lot*. Right now, we're thinking about a really simple feed, detailing a) if a change event is a delete, an update or a create and b) what thing has changed. That's a start, but should be enough to sync up with our data. Cheers, y 2009/11/18 Niklas Lindström lindstr...@gmail.com: Hi Nathan! 2009/11/17 Nathan nat...@webr3.org: very short non-detailed reply from me! I appreciate it. pub/sub, atom feeds, RDF over XMPP were my initial thoughts on the matter last week - essentially triple (update/publish) streams on a pub/sub basis, decentralized suitably, [snip] then my thoughts switched to the fact that RDF is not XML (or any other serialized format) so to keep it non limited I guess the concept would need to be specified first then implemented in whatever formats/ways people saw fit, as has been the case with RDF. I agree that the concept should really be format-independent. But I think it has to be pragmatic and operation-oriented, to avoid never getting there. Atom (feed paging and archiving) is basically designed with exactly this in mind, and it scaled to my use-cases (resources with multiple representations, plus opt. attachments), while still being simple enough to work for just RDF updates. The missing piece is the deleted-entry/tombstone, for which there is thankfully at least an I-D. Therefore modelling the approach around these possibilities required a minimum of invention (none really, just some wording to descibe the practise), and it seems suited for a wide range of dataset syndication scenarios (not so much real-time, where XMPP may be relevant). At least this works very well as long as the datasets can be sensibly partitioned into documents (contexts/graphs). But this is IMHO is the best way to manage RDF anyhow (not the least since one can also leverage simple REST principles for editing; and since quad-stores/SPARQL-endpoints support named contexts etc). But I'd gladly discuss the benefit/drawback ratio of this approach in relation to our and others' scenarios. (I do think it would be nice to lift the resulting timeline to proper RDF -- e.g. AtomOwl (plus a Deletion for tombstones, provenance and logging etc). But these rather complex concepts -- datasources (dataset vs. collection vs. feed vs. page), timelines (entries are *events* for the same resource over time), flat resource manifest concepts, and so on -- require semantic definitions which will probably continue to be debated for quite some time! Atom can be leveraged right now. After all, this is a *very* instrumental aspect for most domains.) this subject is probably not something that should be left for long though.. my (personal) biggest worry about 'linked data' is that junk data will be at an all time high, if not worse, and not nailing this on the head early on (as in weeks/months at max) could contribute to the mess considerably. Couldn't agree with you more. A common, direct (and simple enough) way of syndicating datasets over time would be very beneficial, and shared practises for that seems to be lacking today. COURT http://purl.org/net/court is publically much of a strawman right now, but I would like to flesh it out. Primarily regarding the use of Atom I've described, but also with details of our implementation (the swedish legal information system), concerning collection and storage, proposed validation and URI-minting/verifying strategies, lifting the timeline for logging etc. (In what form and where the project's actual source code will be public remains to be decided (though opensourcing it has always been the official plan). Time permitting I will push my own work in the same vein there for reuse and reference. Regardless I trust the approach to be simple enough to be implementable from reading this mail-thread alone. ;) ) Best regards, Niklas Lindström
Re: RDF Update Feeds
Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Looking forward to hearing what you think ... Cheers, Michael [1] http://esw.w3.org/topic/DatasetDynamics -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Georgi Kobilarov georgi.kobila...@gmx.de Date: Tue, 17 Nov 2009 16:45:46 +0100 To: Linked Data community public-lod@w3.org Subject: RDF Update Feeds Resent-From: Linked Data community public-lod@w3.org Resent-Date: Tue, 17 Nov 2009 15:46:30 + Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
At the Library of Congress we've been experimenting with using an Atom feed to alert subscribers to new resources available at id.loc.gov [1]. The approach is similar to what Niklas' is doing, although we kind of independently arrived at this approach (which was nice to discover). Creates, updates and deletes happen on a weekly basis, so it's important for us to let interested parties know what has changed. We ended up using Atom Tombstones [2] for representing the deletes. And Atom Feed Paging and Archiving (RFC 5005) [3] to allow clients to drill backwards through time. I just noticed Link Relations for Simple Version Navigation [4] get announced on an Atom related discussion list, which looks like it could be useful as well, if you maintain a version history. I'd be interested in any feedback anyone has about using this approach. //Ed [1] http://id.loc.gov/authorities/feed/ [2] http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt [3] http://tools.ietf.org/rfc/rfc5005.txt [4] http://www.ietf.org/id/draft-brown-versioning-link-relations-03.txt
Re: RDF Update Feeds
Hi Michael, Michael Hausenblas wrote: Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Thanks for setting up this. To me, not only the dynamics of data that matters, but also the ability of getting notified of changes, of tracking the changes, of finding out what has been changed, and of finding explanations and evidence for justifying the changes. I don't think /dynamics/ could cover all these. Would this vocabulary in [1] also consider use cases other than dynamics? It seems that some of the above use cases have been discussed somehow in the previous threads. I would be very interested to see them continued:). Cheers, Jun [1] http://esw.w3.org/topic/DatasetDynamics
Re: RDF Update Feeds
Ed Summers wrote: At the Library of Congress we've been experimenting with using an Atom feed to alert subscribers to new resources available at id.loc.gov [1]. The approach is similar to what Niklas' is doing, although we kind of independently arrived at this approach (which was nice to discover). Creates, updates and deletes happen on a weekly basis, so it's important for us to let interested parties know what has changed. We ended up using Atom Tombstones [2] for representing the deletes. And Atom Feed Paging and Archiving (RFC 5005) [3] to allow clients to drill backwards through time. I just noticed Link Relations for Simple Version Navigation [4] get announced on an Atom related discussion list, which looks like it could be useful as well, if you maintain a version history. I'd be interested in any feedback anyone has about using this approach. //Ed [1] http://id.loc.gov/authorities/feed/ [2] http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt [3] http://tools.ietf.org/rfc/rfc5005.txt [4] http://www.ietf.org/id/draft-brown-versioning-link-relations-03.txt In a nutshell, +1 for this approach. -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: RDF Update Feeds
Kingsley Idehen wrote: Ed Summers wrote: At the Library of Congress we've been experimenting with using an Atom feed to alert subscribers to new resources available at id.loc.gov [1]. The approach is similar to what Niklas' is doing, although we kind of independently arrived at this approach (which was nice to discover). Creates, updates and deletes happen on a weekly basis, so it's important for us to let interested parties know what has changed. We ended up using Atom Tombstones [2] for representing the deletes. And Atom Feed Paging and Archiving (RFC 5005) [3] to allow clients to drill backwards through time. I just noticed Link Relations for Simple Version Navigation [4] get announced on an Atom related discussion list, which looks like it could be useful as well, if you maintain a version history. I'd be interested in any feedback anyone has about using this approach. //Ed [1] http://id.loc.gov/authorities/feed/ [2] http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt [3] http://tools.ietf.org/rfc/rfc5005.txt [4] http://www.ietf.org/id/draft-brown-versioning-link-relations-03.txt In a nutshell, +1 for this approach. is this not the same as (or vi similar to) the court approach outlined here: http://code.google.com/p/court/ by Niklas
Re: RDF Update Feeds
On Fri, Nov 20, 2009 at 11:05 AM, Nathan nat...@webr3.org wrote: is this not the same as (or vi similar to) the court approach outlined here: http://code.google.com/p/court/ by Niklas Yes, absolutely. Although I had no idea of Niklas' work at the time. That's why I said: At the Library of Congress we've been experimenting with using an Atom feed to alert subscribers to new resources available at id.loc.gov [1]. The approach is similar to what Niklas' is doing, although we kind of independently arrived at this approach (which was nice to discover). :-) //Ed
Re: RDF Update Feeds
Ed Summers wrote: On Fri, Nov 20, 2009 at 11:05 AM, Nathan nat...@webr3.org wrote: is this not the same as (or vi similar to) the court approach outlined here: http://code.google.com/p/court/ by Niklas Yes, absolutely. Although I had no idea of Niklas' work at the time. That's why I said: At the Library of Congress we've been experimenting with using an Atom feed to alert subscribers to new resources available at id.loc.gov [1]. The approach is similar to what Niklas' is doing, although we kind of independently arrived at this approach (which was nice to discover). :-) //Ed Nathan / Niklas, +1 for both, and nice showcase re. serendipitous collective intelligence :-) -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
RE: RDF Update Feeds
Hi Michael, nice write-up on the wiki! But I think the vocabulary you're proposing is too much generally descriptive. Dataset publishers, once offering update feeds, should not only tell that/if their datasets are dynamic, but instead how dynamic they are. Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Makes sense? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -Original Message- From: Michael Hausenblas [mailto:michael.hausenb...@deri.org] Sent: Friday, November 20, 2009 4:01 PM To: Georgi Kobilarov Cc: Linked Data community Subject: Re: RDF Update Feeds Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Looking forward to hearing what you think ... Cheers, Michael [1] http://esw.w3.org/topic/DatasetDynamics -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Georgi Kobilarov georgi.kobila...@gmx.de Date: Tue, 17 Nov 2009 16:45:46 +0100 To: Linked Data community public-lod@w3.org Subject: RDF Update Feeds Resent-From: Linked Data community public-lod@w3.org Resent-Date: Tue, 17 Nov 2009 15:46:30 + Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds + URI time travel on HTTP-level
Hi Michael, Georgi and all, just to complete the list of proposals, here another one from Herbert Van de Sompel from the Open Archives Initiative. Memento: Time Travel for the Web http://arxiv.org/abs/0911.1112 The idea of Memento is to use HTTP content negotiation in the datetime dimension. By using a newly introduced X-Accept-Datetime HTTP header they add a temporal dimension to URIs. The result is a framework in which archived resources can seamlessly be reached via the URI of their original. Sounds cool to me. Anybody an opinion whether this violates general Web architecture somewhere? Anybody aware of other proposals that work on HTTP-level? Have a nice weekend, Chris -Ursprüngliche Nachricht- Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag von Georgi Kobilarov Gesendet: Freitag, 20. November 2009 18:48 An: 'Michael Hausenblas' Cc: Linked Data community Betreff: RE: RDF Update Feeds Hi Michael, nice write-up on the wiki! But I think the vocabulary you're proposing is too much generally descriptive. Dataset publishers, once offering update feeds, should not only tell that/if their datasets are dynamic, but instead how dynamic they are. Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Makes sense? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -Original Message- From: Michael Hausenblas [mailto:michael.hausenb...@deri.org] Sent: Friday, November 20, 2009 4:01 PM To: Georgi Kobilarov Cc: Linked Data community Subject: Re: RDF Update Feeds Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Looking forward to hearing what you think ... Cheers, Michael [1] http://esw.w3.org/topic/DatasetDynamics -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Georgi Kobilarov georgi.kobila...@gmx.de Date: Tue, 17 Nov 2009 16:45:46 +0100 To: Linked Data community public-lod@w3.org Subject: RDF Update Feeds Resent-From: Linked Data community public-lod@w3.org Resent-Date: Tue, 17 Nov 2009 15:46:30 + Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
Sorry if I have missed something, but... We currently put things like changefreqmonthly/changefreq changefreqdaily/changefreq changefreqnever/changefreq in our semantic sitemaps, and these suggestions seem very similar. Eg http://dotac.rkbexplorer.com/sitemap.xml (And I think these frequencies may correspond to normal sitemaps.) So a naïve approach, if you want RDF, would be to use something very similar (and simple). Of course I am probably known for my naivity, which is often misplaced. Best Hugh On 20/11/2009 17:47, Georgi Kobilarov georgi.kobila...@gmx.de wrote: Hi Michael, nice write-up on the wiki! But I think the vocabulary you're proposing is too much generally descriptive. Dataset publishers, once offering update feeds, should not only tell that/if their datasets are dynamic, but instead how dynamic they are. Could be very simple by expressing: Pull our update-stream once per seconds/minute/hour in order to be *enough* up-to-date. Makes sense? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -Original Message- From: Michael Hausenblas [mailto:michael.hausenb...@deri.org] Sent: Friday, November 20, 2009 4:01 PM To: Georgi Kobilarov Cc: Linked Data community Subject: Re: RDF Update Feeds Georgi, All, I like the discussion, and as it seems to be a recurrent pattern as pointed out by Yves (which might be a sign that we need to invest some more time into it) I've tried to sum up a bit and started a straw-man proposal for a more coarse-grained solution [1]. Looking forward to hearing what you think ... Cheers, Michael [1] http://esw.w3.org/topic/DatasetDynamics -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: Georgi Kobilarov georgi.kobila...@gmx.de Date: Tue, 17 Nov 2009 16:45:46 +0100 To: Linked Data community public-lod@w3.org Subject: RDF Update Feeds Resent-From: Linked Data community public-lod@w3.org Resent-Date: Tue, 17 Nov 2009 15:46:30 + Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
Hi, On 17 Nov 2009, at 15:45, Georgi Kobilarov wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? What about using RSS feeds (w/ RDF extensions) combined with RSSCloud [1] or PubSubHubbub ? Best, Alex. [1] http://rsscloud.org/ [2] http://code.google.com/p/pubsubhubbub/ Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com -- Dr. Alexandre Passant Digital Enterprise Research Institute National University of Ireland, Galway :me owl:sameAs http://apassant.net/alex .
Re: RDF Update Feeds
Georgi Kobilarov wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? After thinking about this (perhaps a bit naive myself as still new) I can't see how this is too complex, infact imho all the existing ways of handling updates for rss, atom etc seem a bit over kill to me. an update (or changeset as I'm thinking about it) is essentially nothing more than this triple has been removed and this one has been added - on a triple level we don't have a update, it's very much the equivalent of replace; thus an update for a single triple is a case of remove old triple, insert new one. and thus, without thinking about technologies, all I can see we are left with is as simple as: - s1 p1 o1 + s2 p2 o2 i guess even something like n3 could be extended to accommodate this: given the following example @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix foaf: http://xmlns.com/foaf/0.1/ . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix swp: http://semanticweb.org/id/Property-3A . @prefix swc: http://semanticweb.org/id/Category-3A . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix swivt: http://semantic-mediawiki.org/swivt/1.0# . @prefix sw: http://semanticweb.org/id/ . sw:ESWC2010 swp:Title 7th Extended Semantic Web Conference^^http://www.w3.org/2001/XMLSchema#string ; rdfs:label ESWC2010 ; a swc:Conference ; swp:Event_in_series wiki:ESWC ; foaf:homepage http://www.eswc2010.org ; swp:Has_location_city sw:Heraklion ; swp:Has_location_country sw:Greece ; swp:Start_date 2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:End_date 2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Abstract_deadline 2009-12-15T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Paper_deadline 2009-12-22T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Notification 2010-02-24T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Camera_ready_due 2010-03-10T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; rdfs:isDefinedBy http://semanticweb.org/wiki/Special:ExportRDF/ESWC2010 ; swivt:page http://semanticweb.org/wiki/ESWC2010 . one could easily add in an operator prefix to signify inserts and deletes; in the following example we change the dates of the conference @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix swp: http://semanticweb.org/id/Property-3A . @prefix sw: http://semanticweb.org/id/ . - sw:ESWC2010 swp:Start_date 2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:End_date 2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime . + sw:ESWC2010 swp:Start_date 2010-06-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:End_date 2010-06-04T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime . once you've got the notation or concept down then everything else will fall in to place; we can create update streams, or release change sets on X interval, notify by ping, or poll or whatever. I dare say you could even handle the same thing in rdf itself by having graph iri on left, making up a quick ontology with say rdfu:add and rdfu:delete, storing a triple as an xml literal on the right so: graph_iri rdfu:add rdfpacket . http://domain.org/mygraph rdfu:add rdf:RDF xmlns:log=http://www.w3.org/2000/10/swap/log#; xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:sw=http://semanticweb.org/id/; xmlns:swp=http://semanticweb.org/id/Property-3A; rdf:Description rdf:about=http://semanticweb.org/id/ESWC2010; sw:Property-3AEnd_date rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-06-03T00:00:00/sw:Property-3AEnd_date sw:Property-3AStart_date rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-05-30T00:00:00/sw:Property-3AStart_date /rdf:Description /rdf:RDF^^rdf:XMLLiteral . as for implementing, if X server were to build up a changeset in this and release it daily/hourly/incrementally ; and server X could also consume and handle these change sets, then we'd be about done as far as i can see? reminder, i am very new to this so if it's all way off - please disregard. regards, nathan
Re: RDF Update Feeds
Nathan wrote: Georgi Kobilarov wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? After thinking about this (perhaps a bit naive myself as still new) I can't see how this is too complex, infact imho all the existing ways of handling updates for rss, atom etc seem a bit over kill to me. an update (or changeset as I'm thinking about it) is essentially nothing more than this triple has been removed and this one has been added - on a triple level we don't have a update, it's very much the equivalent of replace; thus an update for a single triple is a case of remove old triple, insert new one. and thus, without thinking about technologies, all I can see we are left with is as simple as: - s1 p1 o1 + s2 p2 o2 i guess even something like n3 could be extended to accommodate this: given the following example @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix foaf: http://xmlns.com/foaf/0.1/ . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix swp: http://semanticweb.org/id/Property-3A . @prefix swc: http://semanticweb.org/id/Category-3A . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix swivt: http://semantic-mediawiki.org/swivt/1.0# . @prefix sw: http://semanticweb.org/id/ . sw:ESWC2010 swp:Title 7th Extended Semantic Web Conference^^http://www.w3.org/2001/XMLSchema#string ; rdfs:label ESWC2010 ; a swc:Conference ; swp:Event_in_series wiki:ESWC ; foaf:homepage http://www.eswc2010.org ; swp:Has_location_city sw:Heraklion ; swp:Has_location_country sw:Greece ; swp:Start_date 2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:End_date 2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Abstract_deadline 2009-12-15T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Paper_deadline 2009-12-22T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Notification 2010-02-24T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:Camera_ready_due 2010-03-10T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; rdfs:isDefinedBy http://semanticweb.org/wiki/Special:ExportRDF/ESWC2010 ; swivt:page http://semanticweb.org/wiki/ESWC2010 . one could easily add in an operator prefix to signify inserts and deletes; in the following example we change the dates of the conference @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix swp: http://semanticweb.org/id/Property-3A . @prefix sw: http://semanticweb.org/id/ . - sw:ESWC2010 swp:Start_date 2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:End_date 2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime . + sw:ESWC2010 swp:Start_date 2010-06-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ; swp:End_date 2010-06-04T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime . once you've got the notation or concept down then everything else will fall in to place; we can create update streams, or release change sets on X interval, notify by ping, or poll or whatever. I dare say you could even handle the same thing in rdf itself by having graph iri on left, making up a quick ontology with say rdfu:add and rdfu:delete, storing a triple as an xml literal on the right so: graph_iri rdfu:add rdfpacket . http://domain.org/mygraph rdfu:add rdf:RDF xmlns:log=http://www.w3.org/2000/10/swap/log#; xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:sw=http://semanticweb.org/id/; xmlns:swp=http://semanticweb.org/id/Property-3A; rdf:Description rdf:about=http://semanticweb.org/id/ESWC2010; sw:Property-3AEnd_date rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-06-03T00:00:00/sw:Property-3AEnd_date sw:Property-3AStart_date rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-05-30T00:00:00/sw:Property-3AStart_date /rdf:Description /rdf:RDF^^rdf:XMLLiteral . as for implementing, if X server were to build up a changeset in this and release it daily/hourly/incrementally ; and server X could also consume and handle these change sets, then we'd be about done as far as i can see? reminder, i am very new to this so if it's all way off - please disregard. sorry it's late, and I forgot to write half the email :-( as for changeset management; I thought changesets could be published through http and pulled using If-Modified-Since header. 1: client
Re: RDF Update Feeds
Hi Nathan! 2009/11/17 Nathan nat...@webr3.org: very short non-detailed reply from me! I appreciate it. pub/sub, atom feeds, RDF over XMPP were my initial thoughts on the matter last week - essentially triple (update/publish) streams on a pub/sub basis, decentralized suitably, [snip] then my thoughts switched to the fact that RDF is not XML (or any other serialized format) so to keep it non limited I guess the concept would need to be specified first then implemented in whatever formats/ways people saw fit, as has been the case with RDF. I agree that the concept should really be format-independent. But I think it has to be pragmatic and operation-oriented, to avoid never getting there. Atom (feed paging and archiving) is basically designed with exactly this in mind, and it scaled to my use-cases (resources with multiple representations, plus opt. attachments), while still being simple enough to work for just RDF updates. The missing piece is the deleted-entry/tombstone, for which there is thankfully at least an I-D. Therefore modelling the approach around these possibilities required a minimum of invention (none really, just some wording to descibe the practise), and it seems suited for a wide range of dataset syndication scenarios (not so much real-time, where XMPP may be relevant). At least this works very well as long as the datasets can be sensibly partitioned into documents (contexts/graphs). But this is IMHO is the best way to manage RDF anyhow (not the least since one can also leverage simple REST principles for editing; and since quad-stores/SPARQL-endpoints support named contexts etc). But I'd gladly discuss the benefit/drawback ratio of this approach in relation to our and others' scenarios. (I do think it would be nice to lift the resulting timeline to proper RDF -- e.g. AtomOwl (plus a Deletion for tombstones, provenance and logging etc). But these rather complex concepts -- datasources (dataset vs. collection vs. feed vs. page), timelines (entries are *events* for the same resource over time), flat resource manifest concepts, and so on -- require semantic definitions which will probably continue to be debated for quite some time! Atom can be leveraged right now. After all, this is a *very* instrumental aspect for most domains.) this subject is probably not something that should be left for long though.. my (personal) biggest worry about 'linked data' is that junk data will be at an all time high, if not worse, and not nailing this on the head early on (as in weeks/months at max) could contribute to the mess considerably. Couldn't agree with you more. A common, direct (and simple enough) way of syndicating datasets over time would be very beneficial, and shared practises for that seems to be lacking today. COURT http://purl.org/net/court is publically much of a strawman right now, but I would like to flesh it out. Primarily regarding the use of Atom I've described, but also with details of our implementation (the swedish legal information system), concerning collection and storage, proposed validation and URI-minting/verifying strategies, lifting the timeline for logging etc. (In what form and where the project's actual source code will be public remains to be decided (though opensourcing it has always been the official plan). Time permitting I will push my own work in the same vein there for reuse and reference. Regardless I trust the approach to be simple enough to be implementable from reading this mail-thread alone. ;) ) Best regards, Niklas Lindström
RDF Update Feeds
Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
On Tue, 2009-11-17 at 16:45 +0100, Georgi Kobilarov wrote: How do we deal with RDF diffs? Talis' changeset vocab is a good start: http://n2.talis.com/wiki/Changesets It has enough level of details for changes to be rewound, replayed, etc. -- Toby A Inkster mailto:m...@tobyinkster.co.uk http://tobyinkster.co.uk
Re: RDF Update Feeds
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Georgi Kobilarov wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? There have been a few suggestions over the years. [1] immediately jumps to mind, for example. Would sparql update work as a patch format? Generating it might be tricky, and I'm not sure fancy running syndicated updates without checking them first. On the other hand I interact with larger stores using sparql, so simply recording the updates sent would work. Damian [1] http://www.w3.org/DesignIssues/Diff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAksCyIcACgkQAyLCB+mTtykN+QCg6Z89EqNNBwHcbfi0Aj+8nh2L YSoAnj4RnZXwU9ycNoKlG2qVyhbLcOYr =uEMt -END PGP SIGNATURE-
Re: RDF Update Feeds
Damian Steer wrote: There have been a few suggestions over the years. [1] immediately jumps to mind, for example. We integrated functionality for publishing LinkedData updates also in Triplify [1]. Its similar to Talis' changeset approach, but works more like publishing a hierarchically structured update log as linked data itself. Details can be found here: http://triplify.org/vocabulary/update Sören [1] http://triplify.org/ -- -- Sören Auer, AKSW/Computer Science Dept., University of Leipzig http://www.informatik.uni-leipzig.de/~auer, Skype: soerenauer
Re: RDF Update Feeds
Hi, We are working on this issue with our DSNotify [1] approach. Our solution is based on indexing subgraphs of available LD graphs and deriving feature vectors (FV) for each indexed resource. By comparing the sets of newly detected, recently removed and indexed FVs, we can detect create, remove, update and move [2] events in LD sources. These events are logged and can be accessed via a Java API, an XML-RPC interface, and an HTTP interface. We are also developing a vocabulary (and a corresponding API) that can be used to describe so-called eventsets: sets of events that occurred in a particular data source. This vocab is based on LODE and SCOVO and a first draft will be published soon on our website. But DSNotify is not ready to index the whole Web of Data. It may rather be used as an add-on for particular data providers that want to keep a high level of link integrity in their data (because the reported events may be used by the data provider to update its hosted data/links). Other related approaches: - Triplify's Linked Data Up-date Log [3] - Silk's Web of Data Link Maintenance Protocol [4] best regards, Niko [1] http://dsnotify.org/ [2] The main purpose of DSNotify is to detect move events in data sources, i.e., when resources are published with different identifiers (e.g., under a different HTTP URI). Although this should not be the case theoretically (URIs should be cool) it happens quite often in reality, see our paper for details. [3] http://triplify.org/vocabulary/update [4] http://www4.wiwiss.fu-berlin.de/bizer/silk/wodlmp/ Georgi Kobilarov wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
interested in seeing if they'd be interested in something like COURT as well (since they went for Atom (and RDF) in their ORA-ORE specs http://www.openarchives.org/ore/.. * You can use Sitemap extensions http://sw.deri.org/2007/07/sitemapextension/ to expose lists of archive dumps (e.g. http://products.semweb.bestbuy.com/sitemap.xml), which could be crawled incrementally. But I don't know how to easily do deletes without recollecting it all.. * The COURT approach of our system has a rudimentary ping feature so that sources can notify the collector of updated feeds. This could of course be improved by using PubSubHubbub http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.2.html, but that's currently not a priority for us. Best regards, Niklas Lindström PS. Anyone interested in this COURT approach, *please* contact me; I am looking for ways to formalize this for easy reuse, not the least for disseminating government and other open data in a uniform manner. Both on a specification/recommendation level, and for gathering implementations (possibly built upon existing frameworks/content repos/cms:es). On Tue, Nov 17, 2009 at 4:45 PM, Georgi Kobilarov georgi.kobila...@gmx.de wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com
Re: RDF Update Feeds
links I can attach more RDF partitioned to our needs/restrictions, and just wipe entries if they become to large and publish new repartitioned resources carrying RDF. (In theory this also means that the central system can be replaced with a PURL-like redirector, if the agency websites could be deemed persistent over time (which they currently cannot).) == Other approaches == * Library of Congress have similar Atom feeds and tombstones for their subject headings: http://id.loc.gov/authorities/feed/ (paged feeds; no explicit archives that I'm aware of, so I'm not sure about the collectability of the entire dataset over time -- this can be achiveved with regular paging if you're sure you won't drop items when climbing as the dataset is updated). * The ORA-PMH http://www.openarchives.org/pmh/ is an older effort with good specifications (though not as RESTful as e.g. Atom, GData etc). I'm interested in seeing if they'd be interested in something like COURT as well (since they went for Atom (and RDF) in their ORA-ORE specs http://www.openarchives.org/ore/.. * You can use Sitemap extensions http://sw.deri.org/2007/07/sitemapextension/ to expose lists of archive dumps (e.g. http://products.semweb.bestbuy.com/sitemap.xml), which could be crawled incrementally. But I don't know how to easily do deletes without recollecting it all.. * The COURT approach of our system has a rudimentary ping feature so that sources can notify the collector of updated feeds. This could of course be improved by using PubSubHubbub http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.2.html, but that's currently not a priority for us. Best regards, Niklas Lindström PS. Anyone interested in this COURT approach, *please* contact me; I am looking for ways to formalize this for easy reuse, not the least for disseminating government and other open data in a uniform manner. Both on a specification/recommendation level, and for gathering implementations (possibly built upon existing frameworks/content repos/cms:es). On Tue, Nov 17, 2009 at 4:45 PM, Georgi Kobilarov georgi.kobila...@gmx.de wrote: Hi all, I'd like to start a discussion about a topic that I think is getting increasingly important: RDF update feeds. The linked data project is starting to move away from releases of large data dumps towards incremental updates. But how can services consuming rdf data from linked data sources get notified about changes? Is anyone aware of activities to standardize such rdf update feeds, or at least aware of projects already providing any kind of update feed at all? And related to that: How do we deal with RDF diffs? Cheers, Georgi -- Georgi Kobilarov www.georgikobilarov.com