subject:"RDF Update Feeds"

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-26 Thread Toby Inkster

On Thu, 2009-11-26 at 00:04 +, Richard Cyganiak wrote:
 If you choose such a rather broad definition for agent-driven  
 negotiation, then you surely must count the practice of sending  
 different responses based on client IP or User-Agent header, both of  
 which are common on the Web, as examples for server-driven conneg. 

And even different responses based on the client's Cookie header.

-- 
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-26 Thread Erik Hetzner

At Wed, 25 Nov 2009 00:21:04 -0500,
Michael Nelson wrote:
 Hi Erik,
 
 Thanks for your response.  I'm just going to cherry pick a few bits from 
 it:
 
  As an aside, which may or may not be related to Memento, do you think
  there is a useful distinction to be made between web archives which
  preserve the actual bytestream of an HTTP response made at a certain
  time (e.g., the Internet Archive) and CMSs that preserve the general
  content, but allow headers, advertisements, and so on to change (e.g.,
  Wikipedia).
 
  To see what I mean, visit:
 
  http://en.wikipedia.org/w/index.php?title=World_Wide_Weboldid=9419736
 
  and then:
 
  http://web.archive.org/web/20050213030130/en.wikipedia.org/wiki/World_Wide_Web
 
  I am not sure what the relationship is between these two resources.
 
 I'm not 100% sure either.  I think this is a difficult problem in web 
 archiving in general.  The wikipedia link with current content substituted 
 is not exactly the 2005 version, but the IA version isn't really what a 
 user would have seen in 2005 either (at least in terms of presentation).
 
 And:
 
 http://web.archive.org/web/20080103014411/http://www.cnn.com/
 
 for example gives me at least a pop-up add that is relative to today, not 
 Jan 2008 (there may be better examples where today's content is 
 in-lined, but the point remains the same).

I can’t find the popup, but the point is well taken.

The problem of what I call ‘breaking out’ of archived web content is a
very real one when archived web sites are displayed without browser
support, using URI ‘rewriting’ and other tricks. The possibility of
coming up with a solution for this problem is one reason why I am very
excited about this discussion.

Still, I think the intention of IA is different from that of
Wikipedia’s previous versions. IA attempts to capture and replay the
web exactly was it was, while Wikipedia presents its essential content
in the same way while surrounding it with the latest tools.

While either solution would be helpful to somebody researching the
history of a Wikipedia article or to somebody looking for the previous
version, only IA’s approach gives you the advertisements, etc., that
can be very helpful for researchers.

There is the further issue of the fact that IA’s copy is a third part
and in some ways more trustworthy. Whether sites can generally be
trusted to maintain accurate archives of their own content is a
question that has already been answered, in my opinion. (The answer
is, they can’t.) See, e.g., [1].

 As an aside, the Zoetrope (http://doi.acm.org/10.1145/1498759.1498837) 
 took an entirely different approach to this problem in their archives (see 
 pp. 246-247).  They basically took DOM dumps from the client and saved 
 them, rather than a crawler-based URI approach.

Thanks for the pointer.

  My confusion on this issue stems, I believe, from a longstanding
  confusion that I have had with the 302 Found response.
 
  My understanding of 302 Found has always been that, if I visit R and
  receive a 302 Found with Location R', my browser should continue to
  consider R the canonical version and use it for all further requests.
  If I bookmark R' after having been redirected to R, it is in fact R
  which should be bookmarked, and not R'. If I use my browser to send
  that link to a friend, my browser should send R, not R'. I believe
  that this is the meaning given to 302 Found in [3].
 
  I am aware that browsers do not implement what I consider to be the
  correct behavior here, but it is the way that I understand the
  definition of 302 Found.
 
  Perhaps somebody could help me out by clarifying this for me?
 
 Firefox will attempt to do the right thing, but it depends on the client 
 maintaining state about the original URI.  If you dereference R, then get 
 302'd to R', a reload in Firefox will be on R and not R'.

I hadn’t noticed this before, thank you for pointing it out.

 Obviously, if you email or share or probably even bookmark R', then this 
 client-side state will be lost and 3rd party reloads will be relative to 
 R' (in fact, that might be want you *want* to occur).  But at least within 
 a session, Firefox (and possibly other browsers) will reload wrt to the 
 original URI.
 
 Although it is not explicit in the current paper or presentation, we're 
 planning on some method for having R' point back to R to facilitate 
 Memento-aware clients to know the original URI.  We're not sure 
 syntactically how it should be done (a value in the Alternates response 
 header maybe?), but semantically we want R' to point to R.  This

I think your email got cut off there.

In any case, in the context actual existing implementations of 302, I
think Memento is doing the correct thing. That is, redirection from R
to the appropriate content (R') based on conneg make sense to me, for
Memento, if what the user can bookmark and see is the conneg’ed URI
(R')

My belief (see [2] and especially [3]) is that properly behaving

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-26 Thread Mark Baker

On Wed, Nov 25, 2009 at 6:08 PM, Michael Nelson m...@cs.odu.edu wrote:
 I disagree.  I would say that agent-driven negotiation is by far the
 most common form of conneg in use today.  Only it's not done through
 standardized means such as the Alternates header, but instead via
 language and format specific links embedded in HTML, e.g. Click here
 for the PDF version, or a Language/country-selector dropdown in the
 page header, or even via Javascript based selection.

 While the exact line between them might be hard to draw, I'd argue those
 aren't HTTP-level events, but instead are HTML-level events.  In other
 words, I would call those examples navigation.  In addition, navigation
 works well for things that can be expressed in HTML wrappers (e.g., click
 here for the PDF version), but not really for embed/img tags where you want
 to choose between, say, .png  .gif.

I don't draw much of a distinction there, at least for the purposes of
discussions like this; they are all URLs in an HTTP response message.


 Server driven conneg, in comparison, is effectively unused.  Ditto for
 transparent negotiation.

 I think that is an unfair characterization.  I won't guess as to how often
 it is done, but it is done.  It is just not perceived by the user.

I didn't mean to imply it wasn't done.  As Richard (and Larry, in his
referenced message) point out, User-Agent conneg is pretty common.  I
was just trying to point out that it's not used nearly as often as
client selection.

 Almost every browser sends out various Accept request headers, and it is
 not uncommon to have Vary and TCN: Choice response headers (check
 responses from w3c.org, for example).  When done with the 200 response +
 Content-Location header, the URI that the browser displays does not
 change.

I used to use w3.org as an example too, but I've learned since that
it's the exception, not the rule, for Web site design.

 So while I think you are describing agent-driven CN (or something very
 similar), I also think it would be desirable to go ahead and get the full
 monty and define the appropriate Accept header and allow server-driven 
 transparent CN.  Agent-driven CN is still available for clients that wish
 to
 do so.

 I just don't understand the desire for server driven conneg when agent
 driven is the clear winner and has so many advantages;

 we'll have to agree to disagree on that; I think they are different
 modalities.

Fair enough.  I'm just offering you my advice based on my extensive
experience in this space.  You're free not to believe me, of course
8-)

As long as you're also supporting agent driven conneg, I'm happy.

 - not needing to use the inconsistently-implemented Vary header, so
 there's no risk of cache pollution. see
 http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989
 - more visible to search engines
 - simpler for developers, as it's just links returned by the server
 like the rest of their app. no need for new server side modules either


 I would suggest these are among the reasons we champion the 302 response +
 Location header approach (as opposed to 200/Content-Location) -- it
 makes the negotation more transparent

Ah, I see.  Yes, I agree that's a good design choice.

 You might also be interested to read this, where one of the RFC 2616
 editors apologizes for supporting server driven conneg;

 http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html

 Note that he refers to HTTP conneg being broken, but is actually only
 talking about server driven conneg.

 I would counter with that fact that CN features prominently in:

 http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
 http://www.w3.org/TR/cooluris/

 Given the role CN plays in these recent documents, it would seem CN has some
 measure of acceptance in the LOD community.

Content negotiation is a valuable tool, so I'm glad there's interest,
but IMO, both of those documents misrepresent it by only describing
the server-driven form.

Mark.

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Nathan

Danny Ayers wrote:
 What Damian said. I keep all my treasures in Subversion, it seems to work.
 
 
 

3rd that; whilst the http time travel conversation goes on - I can't
help feeling that going down the date header route is only going to end
up in something nobody uses; because it doesn't provide any
implementation details to the developer, and thus nobody will adopt it.

subversion/webdav/deltav on the other hand, everybody knows, it already
works, does the trick and would be easy to implement - essentially all
we're saying is let's version control rdf, a concept we can all
understand, and at worst the addition of a http response version
header tag would pretty much solve exposing all this functionality
through http/rest etc. We could handle exposing diffs etc via restful
post/get params (?since=r6) and also expose different synchronisation
endpoints for data eg on a graph level or a resource level, or however a
developer chooses to do it; the point is that simply specifying to use
version control and one additional version response header will do the job.

it's not perfect, it's not time travel; but it addresses the need in a
familiar standards based way that's been thoroughly thought through and
tested; and moreover it'll allow us all to get on and sync our RDF, now,
rather than in 2 years when it's too late.

all imho of course.

the only thing I can see that remains is to determine the format /
serialization of the updates, and primarily delete, we can take it for
granted that all normal triples / quads are new - so all we need to do
is find a way of saying X quad / triple has been removed.

kinds regards, and naive as ever,

nathan

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Nathan

Hi All,

Apologies, feel like I'm wading in here - but non the less.

issue is how to update / sync rdf; so here's another approach / thought.

timestamp the predicate in a triple.

thus you can query a graph as such:
- uri ?p{date} ?o
- ?s uri{date} ?o
- ?s ?p{date} uri
- ?s ?p{date} ?o

and obviously you'd have historic data by nature as well.

please do tell me the flaw in my thinking.

many regards,

nathan

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Nathan

Nathan wrote:
 timestamp the predicate in a triple.
 
 please do tell me the flaw in my thinking.
 

scrap that, sorry for the noise, doesn't cater for indicating data has
been removed

however point remains that perhaps synchronisation (date or version)
data should perhaps be in the RDF rather than outside the scope of rdf?

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Herbert Van de Sompel


On Nov 25, 2009, at 3:51 AM, Nathan wrote:

Danny Ayers wrote:
What Damian said. I keep all my treasures in Subversion, it seems  
to work.






3rd that; whilst the http time travel conversation goes on - I can't
help feeling that going down the date header route is only going to  
end

up in something nobody uses; because it doesn't provide any
implementation details to the developer, and thus nobody will adopt  
it.



Nathan,

Isn't it a bit early in the game to make such a statement? The  
research results from the memento project were just published in a  
paper, 2 weeks ago. Give us a little time and we'll have  
implementation guidelines up on the memento web site. And, as I  
indicated before, we have plans to write this up as an I-D = RFC.


Cheers

Herbert

==
Herbert Van de Sompel
Digital Library Research  Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Nathan

Herbert Van de Sompel wrote:
 On Nov 25, 2009, at 3:51 AM, Nathan wrote:
 Danny Ayers wrote:
 What Damian said. I keep all my treasures in Subversion, it seems to
 work.




 3rd that; whilst the http time travel conversation goes on - I can't
 help feeling that going down the date header route is only going to end
 up in something nobody uses; because it doesn't provide any
 implementation details to the developer, and thus nobody will adopt it.
 
 
 Nathan,
 
 Isn't it a bit early in the game to make such a statement? The research
 results from the memento project were just published in a paper, 2 weeks
 ago. Give us a little time and we'll have implementation guidelines up
 on the memento web site. And, as I indicated before, we have plans to
 write this up as an I-D = RFC.
 

certainly is, and as mentioned off list, the sincerest of apologies; I
think memento is a fascinating idea, and something that definitely needs
spec'd and hopefully implemented.

feeling the stress of an impending deadline, and it just so happens that
some form of rdf synchronisation is needed, and thus any involvement
from me was on my own private agenda of getting something client
passable working for next week; not the time to be sending off emails to
mailing lists - think I'll be quiet till time is free again, unless I
have something useful to contribute!

many regards,

nathan

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Mark Baker

Michael,

On Wed, Nov 25, 2009 at 1:07 AM, Michael Nelson m...@cs.odu.edu wrote:
 What you describe is really close to what RFC 2616 calls Agent-driven
 Negotiation, which is how CN exists in the absence of Accept-* request
 headers.

That's correct.

 But the TCN: Choice approach is introduced as an optimization.  The idea
 is that if you know you prefer .en, .pdf and .gz then tell the server when
 making your original request and it will do its best to honor those
 requests.

 We think adding an orthogonal dimension for CN will be similar: if you know
 you prefer .en, .pdf, .gz and .20091031, then tell the server when making
 your original request and it will do its best to honor those requests.

I understand.

 In practice, agent-driven CN is rarely done (I can only guess as to why). In
 practice, you get either server-driven (as defined in RFC 2616) or
 transparent CN (introduced in RFC 2616 (well, RFC 2068 actually), but really
 defined in RFCs 2295  2296).  See:
 http://httpd.apache.org/docs/2.3/content-negotiation.html

I disagree.  I would say that agent-driven negotiation is by far the
most common form of conneg in use today.  Only it's not done through
standardized means such as the Alternates header, but instead via
language and format specific links embedded in HTML, e.g. Click here
for the PDF version, or a Language/country-selector dropdown in the
page header, or even via Javascript based selection.

Server driven conneg, in comparison, is effectively unused.  Ditto for
transparent negotiation.

 So while I think you are describing agent-driven CN (or something very
 similar), I also think it would be desirable to go ahead and get the full
 monty and define the appropriate Accept header and allow server-driven 
 transparent CN.  Agent-driven CN is still available for clients that wish to
 do so.

I just don't understand the desire for server driven conneg when agent
driven is the clear winner and has so many advantages;

- not needing to use the inconsistently-implemented Vary header, so
there's no risk of cache pollution. see
http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989
- more visible to search engines
- simpler for developers, as it's just links returned by the server
like the rest of their app. no need for new server side modules either

You might also be interested to read this, where one of the RFC 2616
editors apologizes for supporting server driven conneg;

http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html

Note that he refers to HTTP conneg being broken, but is actually only
talking about server driven conneg.

I think that makes for a pretty strong case against it, and I haven't
even elaborated on the architectural problems I perceive with it
(though some of the advantages above relate closely).

Mark.

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-25 Thread Michael Nelson



Mark,


In practice, agent-driven CN is rarely done (I can only guess as to why). In
practice, you get either server-driven (as defined in RFC 2616) or
transparent CN (introduced in RFC 2616 (well, RFC 2068 actually), but really
defined in RFCs 2295  2296).  See:
http://httpd.apache.org/docs/2.3/content-negotiation.html


I disagree.  I would say that agent-driven negotiation is by far the
most common form of conneg in use today.  Only it's not done through
standardized means such as the Alternates header, but instead via
language and format specific links embedded in HTML, e.g. Click here
for the PDF version, or a Language/country-selector dropdown in the
page header, or even via Javascript based selection.


While the exact line between them might be hard to draw, I'd argue those 
aren't HTTP-level events, but instead are HTML-level events.  In other 
words, I would call those examples navigation.  In addition, navigation 
works well for things that can be expressed in HTML wrappers (e.g., click 
here for the PDF version), but not really for embed/img tags where you 
want to choose between, say, .png  .gif.




Server driven conneg, in comparison, is effectively unused.  Ditto for
transparent negotiation.


I think that is an unfair characterization.  I won't guess as to how often 
it is done, but it is done.  It is just not perceived by the user.


Almost every browser sends out various Accept request headers, and it is 
not uncommon to have Vary and TCN: Choice response headers (check 
responses from w3c.org, for example).  When done with the 200 response + 
Content-Location header, the URI that the browser displays does not 
change.


Also, if you link directly to uncool URIs (e.g., foo.gif or 
bar.html), you won't see any traces of CN in the response because those 
URIs aren't subject to negotiation.



So while I think you are describing agent-driven CN (or something very
similar), I also think it would be desirable to go ahead and get the full
monty and define the appropriate Accept header and allow server-driven 
transparent CN.  Agent-driven CN is still available for clients that wish to
do so.


I just don't understand the desire for server driven conneg when agent
driven is the clear winner and has so many advantages;


we'll have to agree to disagree on that; I think they are different 
modalities.




- not needing to use the inconsistently-implemented Vary header, so
there's no risk of cache pollution. see
http://www.mnot.net/blog/2007/06/20/proxy_caching#comment-2989
- more visible to search engines
- simpler for developers, as it's just links returned by the server
like the rest of their app. no need for new server side modules either



I would suggest these are among the reasons we champion the 302 response + 
Location header approach (as opposed to 200/Content-Location) -- it 
makes the negotation more transparent



You might also be interested to read this, where one of the RFC 2616
editors apologizes for supporting server driven conneg;

http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html

Note that he refers to HTTP conneg being broken, but is actually only
talking about server driven conneg.


I would counter with that fact that CN features prominently in:

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
http://www.w3.org/TR/cooluris/

Given the role CN plays in these recent documents, it would seem CN has 
some measure of acceptance in the LOD community.


regards,

Michael



I think that makes for a pretty strong case against it, and I haven't
even elaborated on the architectural problems I perceive with it
(though some of the advantages above relate closely).

Mark.




Michael L. Nelson m...@cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)

Re: RDF Update Feeds

2009-11-24 Thread Michael Hausenblas


FWIW, I had a quick look at the current caching support in LOD datasets [1]
- not very encouraging, to be honest.

Cheers,
  Michael

[1] http://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/

-- 
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

 From: Michael Hausenblas michael.hausenb...@deri.org
 Date: Sat, 21 Nov 2009 11:19:18 +
 To: Hugh Glaser h...@ecs.soton.ac.uk, Georgi Kobilarov
 georgi.kobila...@gmx.de
 Cc: Linked Data community public-lod@w3.org
 Subject: Re: RDF Update Feeds
 Resent-From: Linked Data community public-lod@w3.org
 Resent-Date: Sat, 21 Nov 2009 11:19:57 +
 
 Georgi, Hugh,
 
 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.
 
 Ah, Georgi, I see. You seem to emphasise the quantitative side whereas I
 just seem to want to flag what kind of source it is. I agree that  Pull our
 update-stream once per seconds/minute/hour in order to be *enough*
 up-to-date should be available, however I think that having the information
 regular/irregular vs. how frequent the update should be made available as
 well. My main use case is motivated from the LOD application-writing area. I
 figured that I quite often have written code that essentially does the same:
 based on the type of data-source it either gets a live copy of the data or
 uses already local available data. Now, given that data set publisher would
 declare the characteristics of their dataset in terms of dynamics, one could
 write such a LOD cache quite easily, I guess, abstracting the necessary
 steps and hence offering a reusable solution. I'll follow-up on this one
 soon via a blog post with a concrete example.
 
 My main question would be: what do we gain if we explicitly represent these
 characteristics, compared to what HTTP provides in terms of caching [1]. One
 might want to argue that the 'built-in' features are sort of too fine
 granular and there is a need for a data-source-level solution.
 
 in our semantic sitemaps, and these suggestions seem very similar.
 Eg
 http://dotac.rkbexplorer.com/sitemap.xml
 (And I think these frequencies may correspond to normal sitemaps.)
 So a naïve approach, if you want RDF, would be to use something very similar
 (and simple).
 Of course I am probably known for my naivity, which is often misplaced.
 
 Hugh, of course you're right (as often ;). Technically, this sort of
 information ('changefreq') is available via sitemaps. Essentially, one could
 lift this to RDF straight-forward, if desired. If you look closely to what I
 propose, however, then you'll see that I aim at a sort of qualitative
 description which could drive my LOD cache (along with the other information
 I already have from the void:Dataset).
 
 Now, before I continue to argue here on a purely theoretical level, lemme
 implement a demo and come back once I have something to discuss ;)
 
 
 Cheers,
   Michael
 
 [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
 
 -- 
 Dr. Michael Hausenblas
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html
 
 
 
 From: Hugh Glaser h...@ecs.soton.ac.uk
 Date: Fri, 20 Nov 2009 18:29:17 +
 To: Georgi Kobilarov georgi.kobila...@gmx.de, Michael Hausenblas
 michael.hausenb...@deri.org
 Cc: Linked Data community public-lod@w3.org
 Subject: Re: RDF Update Feeds
 
 Sorry if I have missed something, but...
 We currently put things like
 changefreqmonthly/changefreq
 changefreqdaily/changefreq
 changefreqnever/changefreq
 in our semantic sitemaps, and these suggestions seem very similar.
 Eg
 http://dotac.rkbexplorer.com/sitemap.xml
 (And I think these frequencies may correspond to normal sitemaps.)
 So a naïve approach, if you want RDF, would be to use something very similar
 (and simple).
 Of course I am probably known for my naivity, which is often misplaced.
 Best
 Hugh
 
 On 20/11/2009 17:47, Georgi Kobilarov georgi.kobila...@gmx.de wrote:
 
 Hi Michael,
 
 nice write-up on the wiki! But I think the vocabulary you're proposing is
 too much generally descriptive. Dataset publishers, once offering update
 feeds, should not only tell that/if their datasets are dynamic, but
 instead how dynamic they are.
 
 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.
 
 Makes sense?
 
 Cheers,
 Georgi 
 
 --
 Georgi Kobilarov
 www.georgikobilarov.com
 
 -Original Message-
 From: Michael Hausenblas [mailto:michael.hausenb...@deri.org]
 Sent: Friday, November 20, 2009 4:01 PM
 To: Georgi Kobilarov
 Cc: Linked Data community
 Subject: Re: RDF Update

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-24 Thread Herbert Van de Sompel

On Nov 23, 2009, at 9:02 PM, Herbert Van de Sompel wrote:

On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote:

At Mon, 23 Nov 2009 00:40:33 -0500,
Mark Baker wrote:

On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com
wrote:
It should be up to resource creators to determine when the nature
of a
resource changes across time. A web architecture that requires
every

single edit to have a different identifier is a large hassle and
likely won't catch on if people find that they can work fine with a
system that evolves constantly using semi-constant identifiers,
rather

than through a series of mandatory time based checkpoints.

You seem to have read more into my argument than was there, and
created a strawman; I agree with the above.

My claim is simply that all HTTP requests, no matter the headers,
are

requests upon the current state of the resource identified by the
Request-URI, and therefore, a request for a representation of the
state of Resource X at time T needs to be directed at the URI for
Resource X at time T, not Resource X.

I think this is a very compelling argument.

Actually, I don't think it is. The issue was also brought up (in a
significantly more tentative manner) in Pete Johnston blog entry on
eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html
). Tomorrow, we will post a response that will try and show that
current state issue is - as far as we can see - not quite as
written in stone as suggested above in the specs that matter in
this case, i.e. Architecture of the World Wide Web and RFC 2616.
Both are interestingly vague about this.

Just to let you know that our response to some issues re Memento
raised here and on Pete Johnston's blog post (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html
) is now available at:

http://www.cs.odu.edu/~mln/memento/response-2009-11-24.html

We have also submitted this as an inline Comment to Pete's blog, but
Comments require approval and that has not happened yet.

Greetings

Herbert Van de Sompel

==
Herbert Van de Sompel
Digital Library Research Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-24 Thread Mark Baker

Herbert,

On Tue, Nov 24, 2009 at 6:10 PM, Herbert Van de Sompel
hvds...@gmail.com wrote:
Just to let you know that our response to some issues re Memento raised here
and on Pete Johnston's blog post
(http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html) is
now available at:
http://www.cs.odu.edu/~mln/memento/response-2009-11-24.html

Regarding the suggestion to use the Link header, I was thinking the
same thing. But the way you describe it being used is different than
how I would suggest it be used. Instead of providing a link to each
available representation, the server would just provide a single link
to the timegate. The client could then GET the timegate URI and find
either the list of URIs (along with date metadata), or some kind of
form-like declaration that would permit it to specify the date/time
for which it desires a representation (e.g. Open Search). Perhaps
this is what you meant by timemap, I can't tell, though I don't see
a need for the use of the Accept header in that case if the client can
either choose or construct a URI for the desired archived
representation.

As for the current state issue, you're right that it isn't a general
constraint of Web architecture. I was assuming we were talking only
about the origin server. Of course, any Web component can be asked
for a representation of any resource, and they are free to answer
those requests in whatever way suits their purpose, including
providing historical versions.

Mark.

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-24 Thread Danny Ayers

Good man, I couldn't help thinking there was a paper in that...

2009/11/22 Herbert Van de Sompel hvds...@gmail.com:
 hi all,
 (thanks Chris, Richard, Danny)

 In light of the current discussion, I would like to provide some
 clarifications regarding Memento: Time Travel for the Web, ie the idea of
 introducing HTTP content negotiation in the datetime dimension:
 (*) Some extra pointers:
 - For those who prefer browsing slides over reading a paper, there is
 http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web
 - Around mid next week, a video recording of a presentation I gave on
 Memento should be available at http://www.oclc.org/research/dss/default.htm
 - The Memento site is at http://www.mementoweb.org. Of special interest may
 be the proposed HTTP interactions for (a) web servers with internal archival
 capabilities such as content management systems, version control systems,
 etc (http://www.mementoweb.org/guide/http/local/) and (b) web servers
 without internal archival capabilities
 (http://www.mementoweb.org/guide/http/remote/).
 (*) The overall motivation for the work is the integration of archived
 resources into regular web navigation by making them available via their
 original URIs. The archived resources we have focused on in our experiments
 so far are those kept by
 (a) Web Archives such as the Internet Archive, Webcite, archive-it.org and
 (b) Content Management Systems such as wikis, CVS, ...
 The reason I pinged Chris Bizer about our work is that we thought that our
 proposed approach could also be of interest in the LoD environment.
  Specifically, the ability to get to prior descriptions of LoD resources by
 doing datetime content negotiation on their URI seemed appealing; e.g. what
 was the dbpedia description for the City of Paris on March 20 2008? This
 ability would, for example, allow analysis of (the evolution of ) data over
 time. The requirement that is currently being discussed in this thread
 (which I interpret to be about approaches to selectively get updates for a
 certain LoD database) is not one I had considered using Memento for,
 thinking this was more in the realm of feed technologies such as Atom (as
 suggested by Ed Summers), or the pre-REST OAI-PMH
 (http://www.openarchives.org/OAI/openarchivesprotocol.html).
 (*) Regarding some issues that were brought up in the discussion so far:
 - We use an X header because that seems to be best practice when doing
 experimental work. We would very much like to eventually migrate to a real
 header, e.g. Accept-Datetime.
 - We are definitely considering and interested in some way to formalize our
 proposal in a specification document. We felt that the I-D/RFC path would
 have been the appropriate one, but are obviously open to other approaches.
 - As suggested by Richard, there is a bootstrapping problem, as there is
 with many new paradigms that are introduced. I trust LoD developers fully
 understand this problem. Actually, the problem is not only at the browser
 level but also at the server level. We are currently working on a FireFox
 plug-in that, when ready, will be available through the regular channels.
 And we have successfully (and experimentally) modified the Mozilla code
 itself to be able to demonstrate the approach. We are very interested in
 getting support in other browsers, natively or via plug-ins. We also have
 some tools available to help with initial deployment
 (http://www.mementoweb.org/tools/ ). One is a plug-in for the mediawiki
 platform; when installed the wiki natively supports datetime content
 negotiation and redirects a client to the history page that was active at
 the datetime requested in the X-Accept-Header. We just started a Google
 group for developers interested in making Memento happen for their web
 servers, content management system, etc.
 (http://groups.google.com/group/memento-dev/).
 (*) Note that the proposed solution also leverages the OAI-ORE specification
 (fully compliant with LoD best practice) as a mechanism to support discovery
 of archived resources.
 I hope this helps to get a better understanding of what Memento is about,
 and what its current status is. Let me end by stating that we would very
 much like to get these ideas broadly adopted. And we understand we will need
 a lot of help to make that happen.
 Cheers
 Herbert
 ==
 Herbert Van de Sompel
 Digital Library Research  Prototyping
 Los Alamos National Laboratory, Research Library
 http://public.lanl.gov/herbertv/
 tel. +1 505 667 1267








-- 
http://danny.ayers.name

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-24 Thread Peter Ansell

2009/11/25 Michael Nelson m...@cs.odu.edu:
 In practice, agent-driven CN is rarely done (I can only guess as to why). In
 practice, you get either server-driven (as defined in RFC 2616) or
 transparent CN (introduced in RFC 2616 (well, RFC 2068 actually), but really
 defined in RFCs 2295  2296).  See:
 http://httpd.apache.org/docs/2.3/content-negotiation.html

My guess is that it relies on users making decisions that they aren't
generally qualified, or concerned enough, to make. Considering
language is basically a constant from the users operating system
configuration, and format differences do not affect users enough to
warrant giving them a choice between XHTML and HTML, or JPG and PNG,
for example. I think browser designers see CN as a good thing for
them, but basically irrelevant to users, and hence they decide it is
easiest to just automate the process using server or transparent
negotiation.

Similar reasoning about why Apache goes so far to try to break down,
what are likely unintentional mix-ups with equal q/qs value
combinations, as it reduces confusion the user. The fact that the
server and transparent CN processes rely on servers for part of the
decision (qs), makes it perfectly fine for them to make the tie
breaker decision in my opinion. There is basically no reason why the
choice the server makes will be inconvienient for users as they
already said that both formats or languages were acceptable in some
way through the Accept- headers. Combined with the servers knowledge,
the tie breaker will only choose one slightly better format compared
to another decent format, resulting in a win-win scenario according to
the users declared preferences. As long as the server sends back the
real Content-Type it chose I am happy.

Cheers,

Peter

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-23 Thread Mark Baker

On Mon, Nov 23, 2009 at 1:01 AM, Peter Ansell ansell.pe...@gmail.com wrote:
 The issue with requiring people to direct requests at the URI for the
 Resource X at time T is that the circular linking issue I described
 previously comes into play because people need to pre-engineer their
 URI's to be compatible with a temporal dimension.

I would recommend the use of a query parameter.

 If the user didn't
 know exactly what time scales were used by the server they would
 either need to follow a roughly drawn up convention, such as
 //MM/DD/meaningfulresourcename, or they would have to find an
 index somewhere, neither of which are as promising for the future of
 the web as having the ability to add another header to provide the
 desired behaviour IMO.

I'm not sure what criteria you're basing that evaluation on, but IME
it's far simpler to deploy a new relation type than a new HTTP header.
 Headers are largely opaque to Web developers.

 The documentation of the Vary header [1] seems to leave the situation
 open as to whether the server needs to be concerned about which or any
 Headers dictate which resource representation is to be returned.
 Caching in the context of HTTP/1.1 may have been designed to
 temporary, but I see no particular reason why a temporal Accept-*
 header, together with the possibility of its addition to Vary,
 couldn't be used on the absolute time dimension. It seems much cleaner
 than adding an extra command to HTTP, or requiring some other non-HTTP
 mechanism altogether. The extra header would never stop a server from
 returning the current version if it doesn't recognise the header, or
 it doesn't keep a version history, so it should be completely
 backwards compatible.

Yes, Vary should, in theory, be used for this purpose.  Unfortunately,
in practice, due to a bug in IE, it has the effect of disabling
caching in the browser and so you don't see it used very much, at
least not for browser based applications;

http://www.ilikespam.com/blog/internet-explorer-meets-the-vary-header

Mark.

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-23 Thread Erik Hetzner

At Mon, 23 Nov 2009 00:40:33 -0500,
Mark Baker wrote:
 
 On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote:
  It should be up to resource creators to determine when the nature of a
  resource changes across time. A web architecture that requires every
  single edit to have a different identifier is a large hassle and
  likely won't catch on if people find that they can work fine with a
  system that evolves constantly using semi-constant identifiers, rather
  than through a series of mandatory time based checkpoints.
 
 You seem to have read more into my argument than was there, and
 created a strawman; I agree with the above.
 
 My claim is simply that all HTTP requests, no matter the headers, are
 requests upon the current state of the resource identified by the
 Request-URI, and therefore, a request for a representation of the
 state of Resource X at time T needs to be directed at the URI for
 Resource X at time T, not Resource X.

I think this is a very compelling argument.

On the other hand, there is, nothing I can see that prevents one URI
from representing another URI as it changes through time. This is
already the case with, e.g.,
http://web.archive.org/web/*/http://example.org, which represents
the URI http://example.org at all times. So this URI could, perhaps,
be a target for X-Accept-Datetime headers.

There is something else that I find problematic about the Memento
proposal. Archival versions of a web page are too important to hide
inside HTTP headers.

To take the canonical example, if I am viewing
http://oakland.example.org/weather, I don’t want the fact that I am
viewing historical weather information to be hidden in the request
headers.

Furthermore, I am viewing resource X as it appeared at time T1, I
should *not* be able to copy that URI and send it to a friend, or use
it as a reference in a document, only to have them see the URI as it
appears at time T2.

I think that those of us in the web archiving community [1] would very
much appreciate a serious look by the web architecture community into
the problem of web archiving. The problem of representing and
resolving the tuple URI, time is a question which has not yet been
adequately dealt with.

best,
Erik Hetzner

1. Those unfamiliar with web archives are encouraged to visit
http://web.archive.org/, http://www.archive-it.org/,
http://www.vefsafn.is/, http://webarchives.cdlib.org/, ...
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpxkRUqLltSH.pgp
Description: PGP signature

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-23 Thread Peter Ansell

2009/11/24 Erik Hetzner erik.hetz...@ucop.edu:
 At Mon, 23 Nov 2009 00:40:33 -0500,
 Mark Baker wrote:

 On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com 
 wrote:
  It should be up to resource creators to determine when the nature of a
  resource changes across time. A web architecture that requires every
  single edit to have a different identifier is a large hassle and
  likely won't catch on if people find that they can work fine with a
  system that evolves constantly using semi-constant identifiers, rather
  than through a series of mandatory time based checkpoints.

 You seem to have read more into my argument than was there, and
 created a strawman; I agree with the above.

 My claim is simply that all HTTP requests, no matter the headers, are
 requests upon the current state of the resource identified by the
 Request-URI, and therefore, a request for a representation of the
 state of Resource X at time T needs to be directed at the URI for
 Resource X at time T, not Resource X.

 I think this is a very compelling argument.

 On the other hand, there is, nothing I can see that prevents one URI
 from representing another URI as it changes through time. This is
 already the case with, e.g.,
 http://web.archive.org/web/*/http://example.org, which represents
 the URI http://example.org at all times. So this URI could, perhaps,
 be a target for X-Accept-Datetime headers.

This is still a different URI though, and requires you to know that
web.archive.org exists and that it has infact trawled example.org.

 There is something else that I find problematic about the Memento
 proposal. Archival versions of a web page are too important to hide
 inside HTTP headers.

The clean aspect of using headers is that you don't have to munge the
URI or attach it to the path of another URI in order to make the
process work.

 To take the canonical example, if I am viewing
 http://oakland.example.org/weather, I don’t want the fact that I am
 viewing historical weather information to be hidden in the request
 headers.

The user-agent could help here.

 Furthermore, I am viewing resource X as it appeared at time T1, I
 should *not* be able to copy that URI and send it to a friend, or use
 it as a reference in a document, only to have them see the URI as it
 appears at time T2.

Current web citation methods typically require that you put Accessed
on DD MM YY next to the URI if you want to publish it. If you were
viewing it at T1 and that wasn't the current version then your
user-agent would need to let you know that you were not viewing the
most up to date copy of the resource.

 I think that those of us in the web archiving community [1] would very
 much appreciate a serious look by the web architecture community into
 the problem of web archiving. The problem of representing and
 resolving the tuple URI, time is a question which has not yet been
 adequately dealt with.

It would still be nice to solve the issue in general so that we don't
have to rely on archiving services in order to get past versions if
you could do it by negotiating directly with the original server.

Cheers,

Peter

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-23 Thread Erik Hetzner

At Tue, 24 Nov 2009 10:14:01 +1000,
Peter Ansell wrote:
 2009/11/24 Erik Hetzner erik.hetz...@ucop.edu:
 […]
  On the other hand, there is, nothing I can see that prevents one URI
  from representing another URI as it changes through time. This is
  already the case with, e.g.,
  http://web.archive.org/web/*/http://example.org, which represents
  the URI http://example.org at all times. So this URI could, perhaps,
  be a target for X-Accept-Datetime headers.
 
 This is still a different URI though, and requires you to know that
 web.archive.org exists and that it has infact trawled example.org.

I agree. I was trying to suggest that, while I agree with Mark Baker
that:

  all HTTP requests, no matter the headers, are requests upon the
  current state of the resource identified by the Request-URI, and
  therefore, a request for a representation of the state of Resource
  X at time T needs to be directed at the URI for Resource X at time
  T, not Resource X.

there could conceivably be a resource, e.g.,
http://web.archive.org/web/*/http://example.org/, whose
representation could vary based on HTTP headers because it represents
all versions of another resource http://example.org/ as that other
resource varied across time.

 The clean aspect of using headers is that you don't have to munge
 the URI or attach it to the path of another URI in order to make the
 process work.

I agree that it is nice to be able to not munge URIs to get archival
content. Rewriting URIs for archived web content is a very difficult
task which is prone to error, and if a user is browsing a web archive
they often end up with ‘live’ (unarchived) web content in embeds, etc.
instead of the archived content.

But if the tradeoff for not munging URIs is to hide the archival
nature of a resource in the HTTP headers I don’t think it is worth it.

  To take the canonical example, if I am viewing
  http://oakland.example.org/weather, I don’t want the fact that I
  am viewing historical weather information to be hidden in the
  request headers.
 
 The user-agent could help here.

Perhaps it could, but I don’t think overloading the meaning of the
resource that currently represents the current weather with historical
weather data is a good idea.

 Current web citation methods typically require that you put Accessed
 on DD MM YY next to the URI if you want to publish it. If you were
 viewing it at T1 and that wasn't the current version then your
 user-agent would need to let you know that you were not viewing the
 most up to date copy of the resource.

I would prefer to move away from current web citation methods. These
methods provide no way for an author to ensure that (as much as
possible) a reader will encounter the same text that the author read,
and they provide no way for the typical reader to find the text as it
was read by the author.

If we are enhancing user agents and requiring user interaction, why
not enhance a user agent with a feature that, given resource X at the
current time T, directs a user to a new URI which uniquely identifies
resource X at time T, a URI that can be copied  pasted as a whole
into a document. Then the author can be reasonably assured that a
reader will be viewing the same content the author viewed.

  I think that those of us in the web archiving community [1] would very
  much appreciate a serious look by the web architecture community into
  the problem of web archiving. The problem of representing and
  resolving the tuple URI, time is a question which has not yet been
  adequately dealt with.
 
 It would still be nice to solve the issue in general so that we don't
 have to rely on archiving services in order to get past versions if
 you could do it by negotiating directly with the original server.

Agreed! Furthermore, it would be nice to solve the problem in such a
way that:

a) the server could provide the past version;
b) failing that, web archive A could provide the past version;
c) failing that, web archive B could provide the past version;
d) and so on.

best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpLNlkkEantB.pgp
Description: PGP signature

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-23 Thread Herbert Van de Sompel

On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote:

At Mon, 23 Nov 2009 00:40:33 -0500,
Mark Baker wrote:

On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com
wrote:
It should be up to resource creators to determine when the nature
of a

resource changes across time. A web architecture that requires every
single edit to have a different identifier is a large hassle and
likely won't catch on if people find that they can work fine with a
system that evolves constantly using semi-constant identifiers,
rather

than through a series of mandatory time based checkpoints.

You seem to have read more into my argument than was there, and
created a strawman; I agree with the above.

My claim is simply that all HTTP requests, no matter the headers, are
requests upon the current state of the resource identified by the
Request-URI, and therefore, a request for a representation of the
state of Resource X at time T needs to be directed at the URI for
Resource X at time T, not Resource X.

I think this is a very compelling argument.

Actually, I don't think it is. The issue was also brought up (in a
significantly more tentative manner) in Pete Johnston blog entry on
eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html
). Tomorrow, we will post a response that will try and show that
current state issue is - as far as we can see - not quite as
written in stone as suggested above in the specs that matter in this
case, i.e. Architecture of the World Wide Web and RFC 2616. Both are
interestingly vague about this.

On the other hand, there is, nothing I can see that prevents one URI
from representing another URI as it changes through time. This is
already the case with, e.g.,
http://web.archive.org/web/*/http://example.org, which represents
the URI http://example.org at all times. So this URI could, perhaps,
be a target for X-Accept-Datetime headers.

That is actually what we do in Memento (see our paper http://arxiv.org/abs/0911.1112)
, and we recognize two cases, here:

(1) If the web server does not keep track of its own archival
versions, then we must rely on archival versions that are stored
elsewhere, i.e. in Web Archives. In this case, the original server who
receives the request can redirect the client to a resource like the
one you mention above, i.e. a resource that stands for archived
versions of another resource. Note that this redirect is a simple
redirect like the ones that happen all the time on the Web. This is
not a redirect that is part of a datetime content negotiation flow,
rather a redirect that occurs because the server has detected an X-
Accept-Datetime header. Now, we don't want to overload the existing http://web.archive.org/web/*/http://example.org
as you suggest, but rather choose to introduce a special-purpose
resource that we call a TimeGate http://web.archive.org/web/timegate/http://example.org
. And we indeed introduce this resource as a target for datetime
content negotiation.

(2) If the web server does keep track of its own archival versions
(think CMS), then it can handle requests for old versions locally as
it has all the information that is required to do so. In this case, we
could also introduce a special-purpose, distinct, TimeGate on this
server, and have the original resource redirect to it. That would make
this case in essence the same as (1) above. This, however, seemed like
a bit of overkill and we felt that the original resource and the
Timegate could coincide; meaning datetime content negotiation occurs
directly against the original resource. Meaning the URI that
represents the resource as it evolves over time is the URI of the
resource itself. It stands for past and present versions. The present
version is delivered (200 OK) from that URI itself (business as
usual), archived versions are delivered from other resources via
content negotiation (302 with Location different than the original URI)

In In both (1) and (2) the original resource plays a role in the
framework, either because it redirects to an external TimeGate that
performs the datetime content negotiation, or because it performs the
datetime content negotiation itself. And we actually think that is
quite essential that this original resource is involved. It is the URI
of the original resource by which the resource has been known as it
evolved over time. It makes sense to be able to use that URI to try
and get to its past versions. And by get, I don't mean search for
it, but rather use the network to get there. After all, we all go by
the same name irrespective of the day you talk to us. Or we have the
same Linked Data URI irrespective of the day it is dereferenced. Why
would we suddenly need a new URI when we want to see what the LoD
description for any of us was, say, a year ago? Why must we prevent
that this same URI helps us to get to prior

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Danny Ayers

2009/11/22 Richard Cyganiak rich...@cyganiak.de:
 On 20 Nov 2009, at 19:07, Chris Bizer wrote:

[snips]

 From a web architecture POV it seems pretty solid to me. Doing stuff via
 headers is considered bad if you could just as well do it via links and
 additional URIs, but you can argue that the time dimension is such a
 universal thing that a header-based solution is warranted.

Sounds good to me too, but x-headers are a jump, I think perhaps it's
a question worthy of throwing at the W3C TAG - pretty sure they've
looked at similar stuff in the past, but things are changing fast...

From what I can gather, proper diffs over time are hard (long before
you get to them logics). But Web-like diffs don't have to be - can't
be any less reliable than my online credit card statement. Bit
worrying there are so many different approaches available, sounds like
there could be a lot of coding time wasted.

But then again, might well be one for evolution - and in the virtual
world trying stuff out is usually worth it.

 The main drawback IMO is that existing clients, such as all web browsers,
 will be unable to access the archived versions, because they don't know
 about the header. If you are archiving web pages or RDF document, then you
 could add links that lead clients to the archived versions, but that won't
 work for images, PDFs and so forth.

Hmm. For one, browsers are in flux, for two then you probably wouldn't
expect that kind of agent to give you anything but the latest.
If I need last years version, I follow my nose through URIs (as in svn
etc) - that kind of thing has to be a fallback, imho.

 In summary, I think it's pretty cool.

Cool idea, for sure. It is something strong...ok, temporal stuff
should be available down at quite a low level, especially given that
things like xmpp will be bouncing around - but I reckon Richard's
right in suggesting the plain old URI thing will currently serve most
purposes.

Cheers,
Danny.

-- 
http://danny.ayers.name

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Damian Steer


On 22 Nov 2009, at 09:39, Danny Ayers wrote:

 2009/11/22 Richard Cyganiak rich...@cyganiak.de:
 On 20 Nov 2009, at 19:07, Chris Bizer wrote:
 
 [snips]
 
 From a web architecture POV it seems pretty solid to me. Doing stuff via
 headers is considered bad if you could just as well do it via links and
 additional URIs, but you can argue that the time dimension is such a
 universal thing that a header-based solution is warranted.
 
 Sounds good to me too, but x-headers are a jump, I think perhaps it's
 a question worthy of throwing at the W3C TAG - pretty sure they've
 looked at similar stuff in the past, but things are changing fast...

See also http://tools.ietf.org/html/rfc3253

Subversion is a partial deltav implementation. It may well be the only deployed 
implementation.

Damian

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Nathan

Damian Steer wrote:
 On 22 Nov 2009, at 09:39, Danny Ayers wrote:
 
 2009/11/22 Richard Cyganiak rich...@cyganiak.de:
 On 20 Nov 2009, at 19:07, Chris Bizer wrote:
 [snips]

 From a web architecture POV it seems pretty solid to me. Doing stuff via
 headers is considered bad if you could just as well do it via links and
 additional URIs, but you can argue that the time dimension is such a
 universal thing that a header-based solution is warranted.
 Sounds good to me too, but x-headers are a jump, I think perhaps it's
 a question worthy of throwing at the W3C TAG - pretty sure they've
 looked at similar stuff in the past, but things are changing fast...
 
 See also http://tools.ietf.org/html/rfc3253
 
 Subversion is a partial deltav implementation. It may well be the only 
 deployed implementation.
 

surely virtuoso webdav w/ ods breifcase can be classed as a deployed
implementation; unsure of status re forking etc but most of it's there
and functioning v well.

nathan

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Kingsley Idehen


Nathan wrote:

Damian Steer wrote:
  

On 22 Nov 2009, at 09:39, Danny Ayers wrote:



2009/11/22 Richard Cyganiak rich...@cyganiak.de:
  

On 20 Nov 2009, at 19:07, Chris Bizer wrote:


[snips]

  

From a web architecture POV it seems pretty solid to me. Doing stuff via
headers is considered bad if you could just as well do it via links and
additional URIs, but you can argue that the time dimension is such a
universal thing that a header-based solution is warranted.


Sounds good to me too, but x-headers are a jump, I think perhaps it's
a question worthy of throwing at the W3C TAG - pretty sure they've
looked at similar stuff in the past, but things are changing fast...
  

See also http://tools.ietf.org/html/rfc3253

Subversion is a partial deltav implementation. It may well be the only deployed 
implementation.




surely virtuoso webdav w/ ods breifcase can be classed as a deployed
implementation; unsure of status re forking etc but most of it's there
and functioning v well.

nathan


  

Nathan,

Yes, but as usual we prefer to wait for some kind of consensus, and then 
we just put the relevant aspect of Virtuoso into play. In a nutshell, 
this is why we committed to industry standards from the get-go, since 
doing so reduces this kind of work to functionality orchestration :-)


WebDAV, Atom Pub, GData etc.. have all existed inside Virtuoso for a 
long time, but on their own the net effect has sometimes been confusion 
(due to value pyramid inversion on the part of its beholders). We also 
see XMPP (which you've alluded to recently, and bugged about by Danbri 
for sometime) and XMPP++ (Google Wave) as interesting. Ditto 
PubSubHubBub etc..


Also note, replication and synchronization e.g., via transaction logs 
(in the most sophisticated cases) is something Virtuoso has handled 
across SQL DBMS engines that provide API  access to transaction logs for 
eons, so this is all very familiar territory. I still remember 
confusion, at advent of blogging, when we indicated the existence of 
Atom and RSS aggregation and indexing support inside Virtuoso (sure you 
can Google up on that) :-)


Giovanni: why isn't the RDFsync protocol (from yourself and Orri) part 
of this conversation? My silence during this conversation has been 
deliberate :-)


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Herbert Van de Sompel

hi all,

(thanks Chris, Richard, Danny)

In light of the current discussion, I would like to provide some
clarifications regarding Memento: Time Travel for the Web, ie the
idea of introducing HTTP content negotiation in the datetime dimension:

(*) Some extra pointers:

- For those who prefer browsing slides over reading a paper, there is
http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web

- Around mid next week, a video recording of a presentation I gave on
Memento should be available at http://www.oclc.org/research/dss/default.htm

- The Memento site is at http://www.mementoweb.org. Of special
interest may be the proposed HTTP interactions for (a) web servers
with internal archival capabilities such as content management
systems, version control systems, etc (http://www.mementoweb.org/guide/http/local/
) and (b) web servers without internal archival capabilities (http://www.mementoweb.org/guide/http/remote/
).

(*) The overall motivation for the work is the integration of archived
resources into regular web navigation by making them available via
their original URIs. The archived resources we have focused on in our
experiments so far are those kept by
(a) Web Archives such as the Internet Archive, Webcite, archive-it.org
and

(b) Content Management Systems such as wikis, CVS, ...
The reason I pinged Chris Bizer about our work is that we thought that
our proposed approach could also be of interest in the LoD
environment. Specifically, the ability to get to prior descriptions
of LoD resources by doing datetime content negotiation on their URI
seemed appealing; e.g. what was the dbpedia description for the City
of Paris on March 20 2008? This ability would, for example, allow
analysis of (the evolution of ) data over time. The requirement that
is currently being discussed in this thread (which I interpret to be
about approaches to selectively get updates for a certain LoD
database) is not one I had considered using Memento for, thinking this
was more in the realm of feed technologies such as Atom (as suggested
by Ed Summers), or the pre-REST OAI-PMH (http://www.openarchives.org/OAI/openarchivesprotocol.html
).

(*) Regarding some issues that were brought up in the discussion so far:

- We use an X header because that seems to be best practice when doing
experimental work. We would very much like to eventually migrate to a
real header, e.g. Accept-Datetime.

- We are definitely considering and interested in some way to
formalize our proposal in a specification document. We felt that the I-
D/RFC path would have been the appropriate one, but are obviously open
to other approaches.

- As suggested by Richard, there is a bootstrapping problem, as there
is with many new paradigms that are introduced. I trust LoD developers
fully understand this problem. Actually, the problem is not only at
the browser level but also at the server level. We are currently
working on a FireFox plug-in that, when ready, will be available
through the regular channels. And we have successfully (and
experimentally) modified the Mozilla code itself to be able to
demonstrate the approach. We are very interested in getting support in
other browsers, natively or via plug-ins. We also have some tools
available to help with initial deployment (http://www.mementoweb.org/tools/
). One is a plug-in for the mediawiki platform; when installed the
wiki natively supports datetime content negotiation and redirects a
client to the history page that was active at the datetime requested
in the X-Accept-Header. We just started a Google group for developers
interested in making Memento happen for their web servers, content
management system, etc. (http://groups.google.com/group/memento-dev/).

(*) Note that the proposed solution also leverages the OAI-ORE
specification (fully compliant with LoD best practice) as a mechanism
to support discovery of archived resources.

I hope this helps to get a better understanding of what Memento is
about, and what its current status is. Let me end by stating that we
would very much like to get these ideas broadly adopted. And we
understand we will need a lot of help to make that happen.

Cheers

Herbert

==
Herbert Van de Sompel
Digital Library Research Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Herbert Van de Sompel

[tried to send this before but somehow did not get through to list]

hi all,

(thanks Chris, Richard, Danny)

(*) Some extra pointers:

- For those who prefer browsing slides over reading a paper, there is
http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web

- Around mid next week, a video recording of a presentation I gave on
Memento should be available at http://www.oclc.org/research/dss/default.htm

(a) Web Archives such as the Internet Archive, Webcite, archive-it.org
and

(*) Regarding some issues that were brought up in the discussion so far:

- We use an X header because that seems to be best practice when doing
experimental work. We would very much like to eventually migrate to a
real header, e.g. Accept-Datetime.

(*) Note that the proposed solution also leverages the OAI-ORE
specification (fully compliant with LoD best practice) as a mechanism
to support discovery of archived resources.

Cheers

Herbert

On Nov 22, 2009, at 2:39 AM, Danny Ayers wrote:

2009/11/22 Richard Cyganiak rich...@cyganiak.de:

On 20 Nov 2009, at 19:07, Chris Bizer wrote:

[snips]

From a web architecture POV it seems pretty solid to me. Doing
stuff via
headers is considered bad if you could just as well do it via links
and

additional URIs, but you can argue that the time dimension is such a
universal thing that a header-based solution is warranted.

Sounds good to me too, but x-headers are a jump, I think perhaps it's
a question worthy of throwing at the W3C TAG - pretty sure they've
looked at similar stuff in the past, but things are changing

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Mark Baker

Hi Chris,

On Fri, Nov 20, 2009 at 1:07 PM, Chris Bizer ch...@bizer.de wrote:
 Hi Michael, Georgi and all,

 just to complete the list of proposals, here another one from Herbert Van de
 Sompel from the Open Archives Initiative.

 Memento: Time Travel for the Web
 http://arxiv.org/abs/0911.1112

 The idea of Memento is to use HTTP content negotiation in the datetime
 dimension. By using a newly introduced X-Accept-Datetime HTTP header they
 add a temporal dimension to URIs. The result is a framework in which
 archived resources can seamlessly be reached via the URI of their original.

 Sounds cool to me. Anybody an opinion whether this violates general Web
 architecture somewhere?

IMO, it does.  The problem is that an HTTP request with the
Accept-Datetime header is logically targeting a different resource
than the one identified in the Request-URI.  Accept-* headers are for
negotiating the selection of resource *representations*, not
resources.  Resource selection should always be handled via
hypermedia.

Mark.

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Peter Ansell

2009/11/23 Mark Baker dist...@acm.org:
 Hi Chris,

 On Fri, Nov 20, 2009 at 1:07 PM, Chris Bizer ch...@bizer.de wrote:
 Hi Michael, Georgi and all,

 just to complete the list of proposals, here another one from Herbert Van de
 Sompel from the Open Archives Initiative.

 Memento: Time Travel for the Web
 http://arxiv.org/abs/0911.1112

 The idea of Memento is to use HTTP content negotiation in the datetime
 dimension. By using a newly introduced X-Accept-Datetime HTTP header they
 add a temporal dimension to URIs. The result is a framework in which
 archived resources can seamlessly be reached via the URI of their original.

 Sounds cool to me. Anybody an opinion whether this violates general Web
 architecture somewhere?

 IMO, it does.  The problem is that an HTTP request with the
 Accept-Datetime header is logically targeting a different resource
 than the one identified in the Request-URI.  Accept-* headers are for
 negotiating the selection of resource *representations*, not
 resources.  Resource selection should always be handled via
 hypermedia.

I think it general it is likely to target a different representation
of the same resource, just in the time dimension rather than in the
spatial format dimensions that Accept headers currently negotiate
with. Arguing that a resource is not different if it has non-equal
binary representations in the format dimension at a particular point
in time, is no different IMO to arguing that the nature of the
resource has not changed because of one or more intentional
non-nature affecting change in one of the binary representations
through time. The use of language as an accept header allows people to
select between representations that do not necessarily contain the
same information, as the translation might not be complete, or there
may be semantic ambiguity that makes it impossible to reliably
translate back and forth between the documents without some
information loss.

If it is consensus that the time dimension is always a special case
where the nature of a resource actually changes if the bits ever
change, then I think it would be more appropriate to use different
identifying features such as locators to retrieve the thing, but
currently I think the case is not very convincing given the current
documentation of Accept possibilities.

In a non-RDF example, one might want to examine the changes in the the
resolution of an image that may have been improved overtime as image
resolution algorithms improve. IMO, a more recent document would be
the same image, just with more detail. Arguing that the exact
dimensions and bit representation of the image have changed, but not
the resource, would be currently accepted if the file format changed
because new Accept possibilities can be added without changing the
nature of the web resource. However, if the file format didn't change,
currently we are not sure, but it seems as though it should be treated
a new image resource. This is a contradiction IMO because we have
already said that the bit representation can be non-identical and the
resulting representations can still identify the same resource based
on the use of Accept headers.

In a semi-serious example, if the resource is strictly different every
time something changes, there would be a never ending circle of
updates necessary if two or more documents started out unlinked, but
wanted to link to the other documents in the strictest manner
possible. If semi-constant identifiers are not allowed, every time a
document was updated, the new document would receive a new identifier
which would require both an update to the other document if the owners
of that document wanted their users to have a link to a document that
linked back to them. This update would require a resource locator
change, which would then allow the other document producer to update
both the link and the resource URI to keep its users up to date. In my
opinion it is a very good thing to allow locators to stay
semi-constant, as the web architecture documentation might be
reasonably thought to represent the real web in some way, which it
would not do if this example were taken seriously.

It should be up to resource creators to determine when the nature of a
resource changes across time. A web architecture that requires every
single edit to have a different identifier is a large hassle and
likely won't catch on if people find that they can work fine with a
system that evolves constantly using semi-constant identifiers, rather
than through a series of mandatory time based checkpoints.

Cheers,

Peter

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Mark Baker

On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote:
 It should be up to resource creators to determine when the nature of a
 resource changes across time. A web architecture that requires every
 single edit to have a different identifier is a large hassle and
 likely won't catch on if people find that they can work fine with a
 system that evolves constantly using semi-constant identifiers, rather
 than through a series of mandatory time based checkpoints.

You seem to have read more into my argument than was there, and
created a strawman; I agree with the above.

My claim is simply that all HTTP requests, no matter the headers, are
requests upon the current state of the resource identified by the
Request-URI, and therefore, a request for a representation of the
state of Resource X at time T needs to be directed at the URI for
Resource X at time T, not Resource X.

Mark.

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-22 Thread Peter Ansell

2009/11/23 Mark Baker dist...@acm.org:
 On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell ansell.pe...@gmail.com wrote:
 It should be up to resource creators to determine when the nature of a
 resource changes across time. A web architecture that requires every
 single edit to have a different identifier is a large hassle and
 likely won't catch on if people find that they can work fine with a
 system that evolves constantly using semi-constant identifiers, rather
 than through a series of mandatory time based checkpoints.

 You seem to have read more into my argument than was there, and
 created a strawman; I agree with the above.

I did take some Literary privilege. The strawman was intended to be
knocked down in the same argument.

 My claim is simply that all HTTP requests, no matter the headers, are
 requests upon the current state of the resource identified by the
 Request-URI, and therefore, a request for a representation of the
 state of Resource X at time T needs to be directed at the URI for
 Resource X at time T, not Resource X.

The issue with requiring people to direct requests at the URI for the
Resource X at time T is that the circular linking issue I described
previously comes into play because people need to pre-engineer their
URI's to be compatible with a temporal dimension. If the user didn't
know exactly what time scales were used by the server they would
either need to follow a roughly drawn up convention, such as
//MM/DD/meaningfulresourcename, or they would have to find an
index somewhere, neither of which are as promising for the future of
the web as having the ability to add another header to provide the
desired behaviour IMO.

The documentation of the Vary header [1] seems to leave the situation
open as to whether the server needs to be concerned about which or any
Headers dictate which resource representation is to be returned.
Caching in the context of HTTP/1.1 may have been designed to
temporary, but I see no particular reason why a temporal Accept-*
header, together with the possibility of its addition to Vary,
couldn't be used on the absolute time dimension. It seems much cleaner
than adding an extra command to HTTP, or requiring some other non-HTTP
mechanism altogether. The extra header would never stop a server from
returning the current version if it doesn't recognise the header, or
it doesn't keep a version history, so it should be completely
backwards compatible.

Cheers,

Peter

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44

Re: RDF Update Feeds

2009-11-21 Thread Michael Hausenblas

Georgi, Hugh,

 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.

Ah, Georgi, I see. You seem to emphasise the quantitative side whereas I
just seem to want to flag what kind of source it is. I agree that  Pull our
update-stream once per seconds/minute/hour in order to be *enough*
up-to-date should be available, however I think that having the information
regular/irregular vs. how frequent the update should be made available as
well. My main use case is motivated from the LOD application-writing area. I
figured that I quite often have written code that essentially does the same:
based on the type of data-source it either gets a live copy of the data or
uses already local available data. Now, given that data set publisher would
declare the characteristics of their dataset in terms of dynamics, one could
write such a LOD cache quite easily, I guess, abstracting the necessary
steps and hence offering a reusable solution. I'll follow-up on this one
soon via a blog post with a concrete example.

My main question would be: what do we gain if we explicitly represent these
characteristics, compared to what HTTP provides in terms of caching [1]. One
might want to argue that the 'built-in' features are sort of too fine
granular and there is a need for a data-source-level solution.

 in our semantic sitemaps, and these suggestions seem very similar.
 Eg
 http://dotac.rkbexplorer.com/sitemap.xml
 (And I think these frequencies may correspond to normal sitemaps.)
 So a naïve approach, if you want RDF, would be to use something very similar
 (and simple).
 Of course I am probably known for my naivity, which is often misplaced.

Hugh, of course you're right (as often ;). Technically, this sort of
information ('changefreq') is available via sitemaps. Essentially, one could
lift this to RDF straight-forward, if desired. If you look closely to what I
propose, however, then you'll see that I aim at a sort of qualitative
description which could drive my LOD cache (along with the other information
I already have from the void:Dataset).

Now, before I continue to argue here on a purely theoretical level, lemme
implement a demo and come back once I have something to discuss ;)


Cheers,
  Michael

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html

-- 
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



 From: Hugh Glaser h...@ecs.soton.ac.uk
 Date: Fri, 20 Nov 2009 18:29:17 +
 To: Georgi Kobilarov georgi.kobila...@gmx.de, Michael Hausenblas
 michael.hausenb...@deri.org
 Cc: Linked Data community public-lod@w3.org
 Subject: Re: RDF Update Feeds
 
 Sorry if I have missed something, but...
 We currently put things like
 changefreqmonthly/changefreq
 changefreqdaily/changefreq
 changefreqnever/changefreq
 in our semantic sitemaps, and these suggestions seem very similar.
 Eg
 http://dotac.rkbexplorer.com/sitemap.xml
 (And I think these frequencies may correspond to normal sitemaps.)
 So a naïve approach, if you want RDF, would be to use something very similar
 (and simple).
 Of course I am probably known for my naivity, which is often misplaced.
 Best
 Hugh
 
 On 20/11/2009 17:47, Georgi Kobilarov georgi.kobila...@gmx.de wrote:
 
 Hi Michael,
 
 nice write-up on the wiki! But I think the vocabulary you're proposing is
 too much generally descriptive. Dataset publishers, once offering update
 feeds, should not only tell that/if their datasets are dynamic, but
 instead how dynamic they are.
 
 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.
 
 Makes sense?
 
 Cheers,
 Georgi 
 
 --
 Georgi Kobilarov
 www.georgikobilarov.com
 
 -Original Message-
 From: Michael Hausenblas [mailto:michael.hausenb...@deri.org]
 Sent: Friday, November 20, 2009 4:01 PM
 To: Georgi Kobilarov
 Cc: Linked Data community
 Subject: Re: RDF Update Feeds
 
 
 Georgi, All,
 
 I like the discussion, and as it seems to be a recurrent pattern as
 pointed
 out by Yves (which might be a sign that we need to invest some more
 time
 into it) I've tried to sum up a bit and started a straw-man proposal
 for a
 more coarse-grained solution [1].
 
 Looking forward to hearing what you think ...
 
 Cheers,
   Michael
 
 [1] http://esw.w3.org/topic/DatasetDynamics
 
 --
 Dr. Michael Hausenblas
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html
 
 
 
 From: Georgi Kobilarov georgi.kobila...@gmx.de
 Date: Tue, 17 Nov 2009 16:45:46 +0100
 To: Linked Data community public-lod@w3.org
 Subject: RDF Update Feeds
 Resent-From: Linked Data

Re: RDF Update Feeds

2009-11-21 Thread Peter Ansell

2009/11/21 Michael Hausenblas michael.hausenb...@deri.org:
 Georgi, Hugh,

 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.

 Ah, Georgi, I see. You seem to emphasise the quantitative side whereas I
 just seem to want to flag what kind of source it is. I agree that  Pull our
 update-stream once per seconds/minute/hour in order to be *enough*
 up-to-date should be available, however I think that having the information
 regular/irregular vs. how frequent the update should be made available as
 well. My main use case is motivated from the LOD application-writing area. I
 figured that I quite often have written code that essentially does the same:
 based on the type of data-source it either gets a live copy of the data or
 uses already local available data. Now, given that data set publisher would
 declare the characteristics of their dataset in terms of dynamics, one could
 write such a LOD cache quite easily, I guess, abstracting the necessary
 steps and hence offering a reusable solution. I'll follow-up on this one
 soon via a blog post with a concrete example.

If you want to do polling based on single resources at regular (ie,
less than day), intervals then you are likely to flood the server just
looking for potential updates in cases where the server really doesn't
know how often a particular resources is going to be updated, such as
DBpedia-live where the update rate is completely reliant on the amount
of activity on Wikipedia which is likely to spike at certain times,
and then even out and possibly drop off for months at a time.

Using a change feed with clients polling once per period on a sliding
window feed will break down whenever the temporary update rate is so
fast that a full window on the feed passes before clients do
consecutive polls on the update feed. There is no way to guarantee
what the maximum update rate for DBpedia-live is, for example, so the
published update rate would have to simply be as often as the server
can handle based on the size of the RSS file required to publish
information about which resources have been recently updated. The main
reason that RSS isn't useful for consistency IMO is that it relies on
clients updating very regularly or else they actually miss out
permanently on information and the RSS reader application contains a
limited set of what was really published on the feed.

The mechanism that DBpedia-live uses to monitor Wikipedia might be a
candidate, however it still suffers from issues with clients dropping
out for periods of time and either missing updates or getting large
spikes when they come back online. If clients do not receive the
notifications for a day on DBpedia-live, could they possibly catch up
without performing a DOS on the server trying to poll all of the
announcements that they missed out on?

If this is going to work and minimise bandwidth usage, there needs to
be some mechanism to enable clients to check if information is newer
than the cached information without any actual RDF information being
transferred. Currently RDF databases don't support this, and it is
particularly hard to support where the GRAPH used in the database is
not meant to be a single document, such as http://dbpedia.org on
DBpedia.

Cheers,

Peter

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-21 Thread Richard Cyganiak


On 20 Nov 2009, at 19:07, Chris Bizer wrote:
just to complete the list of proposals, here another one from  
Herbert Van de

Sompel from the Open Archives Initiative.

Memento: Time Travel for the Web
http://arxiv.org/abs/0911.1112

The idea of Memento is to use HTTP content negotiation in the datetime
dimension. By using a newly introduced X-Accept-Datetime HTTP header  
they

add a temporal dimension to URIs. The result is a framework in which
archived resources can seamlessly be reached via the URI of their  
original.


Interesting! It seems to be most useful for “time travelling” on the  
web, and would allow me to browse the web as it was at some point in  
the past, similar to the Wayback Machine [1]. Unlike the Wayback  
Machine, it would work without a central archive, and only on those  
servers that implement the proposal, and only with a browser/client  
that supports the feature.


I don't immediately see how this could be used to synchronize updates  
between datasets though. Being able to access past versions of URIs  
doesn't tell me what has changed throughout the site between then and  
today.


Sounds cool to me. Anybody an opinion whether this violates general  
Web

architecture somewhere?


From a web architecture POV it seems pretty solid to me. Doing stuff  
via headers is considered bad if you could just as well do it via  
links and additional URIs, but you can argue that the time dimension  
is such a universal thing that a header-based solution is warranted.


The main drawback IMO is that existing clients, such as all web  
browsers, will be unable to access the archived versions, because they  
don't know about the header. If you are archiving web pages or RDF  
document, then you could add links that lead clients to the archived  
versions, but that won't work for images, PDFs and so forth.


In summary, I think it's pretty cool. Anyone who has used Apple's Time  
Machine would probably get a kick out of the idea of doing the same on  
a web page, zooming into the past on a a Wikipedia page or on Github  
or on a weather site. But if you're only interested in doing something  
for a single site, then an ad-hoc solution based on URIs for old  
versions is probably more practical.


Best,
Richard


[1] http://www.archive.org/web/web.php


Anybody aware of other proposals that work on HTTP-level?

Have a nice weekend,

Chris




-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im

Auftrag

von Georgi Kobilarov
Gesendet: Freitag, 20. November 2009 18:48
An: 'Michael Hausenblas'
Cc: Linked Data community
Betreff: RE: RDF Update Feeds

Hi Michael,

nice write-up on the wiki! But I think the vocabulary you're  
proposing is
too much generally descriptive. Dataset publishers, once offering  
update

feeds, should not only tell that/if their datasets are dynamic, but
instead how dynamic they are.

Could be very simple by expressing: Pull our update-stream once per
seconds/minute/hour in order to be *enough* up-to-date.

Makes sense?

Cheers,
Georgi

--
Georgi Kobilarov
www.georgikobilarov.com


-Original Message-
From: Michael Hausenblas [mailto:michael.hausenb...@deri.org]
Sent: Friday, November 20, 2009 4:01 PM
To: Georgi Kobilarov
Cc: Linked Data community
Subject: Re: RDF Update Feeds


Georgi, All,

I like the discussion, and as it seems to be a recurrent pattern as
pointed
out by Yves (which might be a sign that we need to invest some more
time
into it) I've tried to sum up a bit and started a straw-man proposal
for a
more coarse-grained solution [1].

Looking forward to hearing what you think ...

Cheers,
 Michael

[1] http://esw.w3.org/topic/DatasetDynamics

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html




From: Georgi Kobilarov georgi.kobila...@gmx.de
Date: Tue, 17 Nov 2009 16:45:46 +0100
To: Linked Data community public-lod@w3.org
Subject: RDF Update Feeds
Resent-From: Linked Data community public-lod@w3.org
Resent-Date: Tue, 17 Nov 2009 15:46:30 +

Hi all,

I'd like to start a discussion about a topic that I think is  
getting

increasingly important: RDF update feeds.

The linked data project is starting to move away from releases of

large data
dumps towards incremental updates. But how can services consuming  
rdf

data
from linked data sources get notified about changes? Is anyone  
aware

of
activities to standardize such rdf update feeds, or at least  
aware of

projects already providing any kind of update feed at all? And

related to

that: How do we deal with RDF diffs?

Cheers,
Georgi

--
Georgi Kobilarov
www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-20 Thread Yves Raimond

Hello!

Back in April, we had a similar discussion:

http://lists.w3.org/Archives/Public/public-lod/2009Apr/0130.html

Concretely, we are having exactly the same problem for syncing up
aggregations of BBC RDF data (Talis's and OpenLink's), as our data
changes *a lot*.

Right now, we're thinking about a really simple feed, detailing a) if
a change event is a delete, an update or a create and b) what thing
has changed. That's a start, but should be enough to sync up with our
data.

Cheers,
y

2009/11/18 Niklas Lindström lindstr...@gmail.com:
 Hi Nathan!

 2009/11/17 Nathan nat...@webr3.org:
 very short non-detailed reply from me!

 I appreciate it.

 pub/sub, atom feeds, RDF over XMPP were my initial thoughts on the
 matter last week - essentially triple (update/publish) streams on a
 pub/sub basis, decentralized suitably, [snip]

 then my thoughts switched to the fact that RDF is not XML (or any other
 serialized format) so to keep it non limited I guess the concept would
 need to be specified first then implemented in whatever formats/ways
 people saw fit, as has been the case with RDF.

 I agree that the concept should really be format-independent. But I
 think it has to be pragmatic and operation-oriented, to avoid never
 getting there.

 Atom (feed paging and archiving) is basically designed with exactly
 this in mind, and it scaled to my use-cases (resources with multiple
 representations, plus opt. attachments), while still being simple
 enough to work for just RDF updates. The missing piece is the
 deleted-entry/tombstone, for which there is thankfully at least an
 I-D.

 Therefore modelling the approach around these possibilities required a
 minimum of invention (none really, just some wording to descibe the
 practise), and it seems suited for a wide range of dataset syndication
 scenarios (not so much real-time, where XMPP may be relevant).

 At least this works very well as long as the datasets can be sensibly
 partitioned into documents (contexts/graphs). But this is IMHO is
 the best way to manage RDF anyhow (not the least since one can also
 leverage simple REST principles for editing; and since
 quad-stores/SPARQL-endpoints support named contexts etc).

 But I'd gladly discuss the benefit/drawback ratio of this approach in
 relation to our and others' scenarios.

 (I do think it would be nice to lift the resulting timeline to
 proper RDF -- e.g. AtomOwl (plus a Deletion for tombstones, provenance
 and logging etc). But these rather complex concepts -- datasources
 (dataset vs. collection vs. feed vs. page), timelines (entries are
 *events* for the same resource over time), flat resource manifest
 concepts, and so on -- require semantic definitions which will
 probably continue to be debated for quite some time! Atom can be
 leveraged right now. After all, this is a *very* instrumental aspect
 for most domains.)


 this subject is probably not something that should be left for long
 though.. my (personal) biggest worry about 'linked data' is that junk
 data will be at an all time high, if not worse, and not nailing this on
 the head early on (as in weeks/months at max) could contribute to the
 mess considerably.

 Couldn't agree with you more. A common, direct (and simple enough)
 way of syndicating datasets over time would be very beneficial, and
 shared practises for that seems to be lacking today.

 COURT http://purl.org/net/court is publically much of a strawman
 right now, but I would like to flesh it out. Primarily regarding the
 use of Atom I've described, but also with details of our
 implementation (the swedish legal information system), concerning
 collection and storage, proposed validation and URI-minting/verifying
 strategies, lifting the timeline for logging etc.

 (In what form and where the project's actual source code will be
 public remains to be decided (though opensourcing it has always been
 the official plan). Time permitting I will push my own work in the
 same vein there for reuse and reference. Regardless I trust the
 approach to be simple enough to be implementable from reading this
 mail-thread alone. ;) )

 Best regards,
 Niklas Lindström

Re: RDF Update Feeds

2009-11-20 Thread Michael Hausenblas


Georgi, All,

I like the discussion, and as it seems to be a recurrent pattern as pointed
out by Yves (which might be a sign that we need to invest some more time
into it) I've tried to sum up a bit and started a straw-man proposal for a
more coarse-grained solution [1].

Looking forward to hearing what you think ...

Cheers,
  Michael

[1] http://esw.w3.org/topic/DatasetDynamics

-- 
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



 From: Georgi Kobilarov georgi.kobila...@gmx.de
 Date: Tue, 17 Nov 2009 16:45:46 +0100
 To: Linked Data community public-lod@w3.org
 Subject: RDF Update Feeds
 Resent-From: Linked Data community public-lod@w3.org
 Resent-Date: Tue, 17 Nov 2009 15:46:30 +
 
 Hi all,
 
 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.
 
 The linked data project is starting to move away from releases of large data
 dumps towards incremental updates. But how can services consuming rdf data
 from linked data sources get notified about changes? Is anyone aware of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And related to
 that: How do we deal with RDF diffs?
 
 Cheers,
 Georgi
 
 --
 Georgi Kobilarov
 www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-20 Thread Ed Summers

At the Library of Congress we've been experimenting with using an Atom
feed to alert subscribers to new resources available at id.loc.gov
[1]. The approach is similar to what Niklas' is doing, although we
kind of independently arrived at this approach (which was nice to
discover).

Creates, updates and deletes happen on a weekly basis, so it's
important for us to let interested parties know what has changed. We
ended up using Atom Tombstones [2] for representing the deletes. And
Atom Feed Paging and Archiving (RFC 5005) [3]  to allow clients to
drill backwards through time.

I just noticed Link Relations for Simple Version Navigation [4] get
announced on an Atom related discussion list, which looks like it
could be useful as well, if you maintain a version history.

I'd be interested in any feedback anyone has about using this approach.

//Ed

[1] http://id.loc.gov/authorities/feed/
[2] http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt
[3] http://tools.ietf.org/rfc/rfc5005.txt
[4] http://www.ietf.org/id/draft-brown-versioning-link-relations-03.txt

Re: RDF Update Feeds

2009-11-20 Thread Jun Zhao


Hi Michael,

Michael Hausenblas wrote:

Georgi, All,

I like the discussion, and as it seems to be a recurrent pattern as pointed
out by Yves (which might be a sign that we need to invest some more time
into it) I've tried to sum up a bit and started a straw-man proposal for a
more coarse-grained solution [1].


Thanks for setting up this.

To me, not only the dynamics of data that matters, but also the ability 
of getting notified of changes, of tracking the changes, of finding out 
what has been changed, and of finding explanations and evidence for 
justifying the changes. I don't think /dynamics/ could cover all these.


Would this vocabulary in [1] also consider use cases other than 
dynamics? It seems that some of the above use cases have been discussed 
somehow in the previous threads. I would be very interested to see them 
continued:).



Cheers,

Jun

[1] http://esw.w3.org/topic/DatasetDynamics

Re: RDF Update Feeds

2009-11-20 Thread Kingsley Idehen


Ed Summers wrote:

At the Library of Congress we've been experimenting with using an Atom
feed to alert subscribers to new resources available at id.loc.gov
[1]. The approach is similar to what Niklas' is doing, although we
kind of independently arrived at this approach (which was nice to
discover).

Creates, updates and deletes happen on a weekly basis, so it's
important for us to let interested parties know what has changed. We
ended up using Atom Tombstones [2] for representing the deletes. And
Atom Feed Paging and Archiving (RFC 5005) [3]  to allow clients to
drill backwards through time.

I just noticed Link Relations for Simple Version Navigation [4] get
announced on an Atom related discussion list, which looks like it
could be useful as well, if you maintain a version history.

I'd be interested in any feedback anyone has about using this approach.

//Ed

[1] http://id.loc.gov/authorities/feed/
[2] http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt
[3] http://tools.ietf.org/rfc/rfc5005.txt
[4] http://www.ietf.org/id/draft-brown-versioning-link-relations-03.txt


  

In a nutshell, +1 for this approach.


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: RDF Update Feeds

2009-11-20 Thread Nathan

Kingsley Idehen wrote:
 Ed Summers wrote:
 At the Library of Congress we've been experimenting with using an Atom
 feed to alert subscribers to new resources available at id.loc.gov
 [1]. The approach is similar to what Niklas' is doing, although we
 kind of independently arrived at this approach (which was nice to
 discover).

 Creates, updates and deletes happen on a weekly basis, so it's
 important for us to let interested parties know what has changed. We
 ended up using Atom Tombstones [2] for representing the deletes. And
 Atom Feed Paging and Archiving (RFC 5005) [3]  to allow clients to
 drill backwards through time.

 I just noticed Link Relations for Simple Version Navigation [4] get
 announced on an Atom related discussion list, which looks like it
 could be useful as well, if you maintain a version history.

 I'd be interested in any feedback anyone has about using this approach.

 //Ed

 [1] http://id.loc.gov/authorities/feed/
 [2]
 http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt
 [3] http://tools.ietf.org/rfc/rfc5005.txt
 [4] http://www.ietf.org/id/draft-brown-versioning-link-relations-03.txt


   
 In a nutshell, +1 for this approach.
 
 

is this not the same as (or vi similar to) the court approach outlined
here: http://code.google.com/p/court/ by Niklas

Re: RDF Update Feeds

2009-11-20 Thread Ed Summers

On Fri, Nov 20, 2009 at 11:05 AM, Nathan nat...@webr3.org wrote:
 is this not the same as (or vi similar to) the court approach outlined
 here: http://code.google.com/p/court/ by Niklas

Yes, absolutely. Although I had no idea of Niklas' work at the time.
That's why I said:


At the Library of Congress we've been experimenting with using an Atom
feed to alert subscribers to new resources available at id.loc.gov
[1]. The approach is similar to what Niklas' is doing, although we
kind of independently arrived at this approach (which was nice to
discover).


:-)

//Ed

Re: RDF Update Feeds

2009-11-20 Thread Kingsley Idehen


Ed Summers wrote:

On Fri, Nov 20, 2009 at 11:05 AM, Nathan nat...@webr3.org wrote:
  

is this not the same as (or vi similar to) the court approach outlined
here: http://code.google.com/p/court/ by Niklas



Yes, absolutely. Although I had no idea of Niklas' work at the time.
That's why I said:


At the Library of Congress we've been experimenting with using an Atom
feed to alert subscribers to new resources available at id.loc.gov
[1]. The approach is similar to what Niklas' is doing, although we
kind of independently arrived at this approach (which was nice to
discover).


:-)

//Ed


  

Nathan / Niklas,

+1 for both, and nice showcase re. serendipitous collective intelligence :-)

--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO 
OpenLink Software Web: http://www.openlinksw.com

RE: RDF Update Feeds

2009-11-20 Thread Georgi Kobilarov

Hi Michael,

nice write-up on the wiki! But I think the vocabulary you're proposing is
too much generally descriptive. Dataset publishers, once offering update
feeds, should not only tell that/if their datasets are dynamic, but
instead how dynamic they are. 

Could be very simple by expressing: Pull our update-stream once per
seconds/minute/hour in order to be *enough* up-to-date.

Makes sense?

Cheers,
Georgi 

--
Georgi Kobilarov
www.georgikobilarov.com

 -Original Message-
 From: Michael Hausenblas [mailto:michael.hausenb...@deri.org]
 Sent: Friday, November 20, 2009 4:01 PM
 To: Georgi Kobilarov
 Cc: Linked Data community
 Subject: Re: RDF Update Feeds
 
 
 Georgi, All,
 
 I like the discussion, and as it seems to be a recurrent pattern as
 pointed
 out by Yves (which might be a sign that we need to invest some more
 time
 into it) I've tried to sum up a bit and started a straw-man proposal
 for a
 more coarse-grained solution [1].
 
 Looking forward to hearing what you think ...
 
 Cheers,
   Michael
 
 [1] http://esw.w3.org/topic/DatasetDynamics
 
 --
 Dr. Michael Hausenblas
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html
 
 
 
  From: Georgi Kobilarov georgi.kobila...@gmx.de
  Date: Tue, 17 Nov 2009 16:45:46 +0100
  To: Linked Data community public-lod@w3.org
  Subject: RDF Update Feeds
  Resent-From: Linked Data community public-lod@w3.org
  Resent-Date: Tue, 17 Nov 2009 15:46:30 +
 
  Hi all,
 
  I'd like to start a discussion about a topic that I think is getting
  increasingly important: RDF update feeds.
 
  The linked data project is starting to move away from releases of
 large data
  dumps towards incremental updates. But how can services consuming rdf
 data
  from linked data sources get notified about changes? Is anyone aware
 of
  activities to standardize such rdf update feeds, or at least aware of
  projects already providing any kind of update feed at all? And
 related to
  that: How do we deal with RDF diffs?
 
  Cheers,
  Georgi
 
  --
  Georgi Kobilarov
  www.georgikobilarov.com

Re: RDF Update Feeds + URI time travel on HTTP-level

2009-11-20 Thread Chris Bizer

Hi Michael, Georgi and all,

just to complete the list of proposals, here another one from Herbert Van de
Sompel from the Open Archives Initiative.

Memento: Time Travel for the Web
http://arxiv.org/abs/0911.1112

The idea of Memento is to use HTTP content negotiation in the datetime
dimension. By using a newly introduced X-Accept-Datetime HTTP header they
add a temporal dimension to URIs. The result is a framework in which
archived resources can seamlessly be reached via the URI of their original.

Sounds cool to me. Anybody an opinion whether this violates general Web
architecture somewhere?
Anybody aware of other proposals that work on HTTP-level?

Have a nice weekend,

Chris



 -Ursprüngliche Nachricht-
 Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im
Auftrag
 von Georgi Kobilarov
 Gesendet: Freitag, 20. November 2009 18:48
 An: 'Michael Hausenblas'
 Cc: Linked Data community
 Betreff: RE: RDF Update Feeds
 
 Hi Michael,
 
 nice write-up on the wiki! But I think the vocabulary you're proposing is
 too much generally descriptive. Dataset publishers, once offering update
 feeds, should not only tell that/if their datasets are dynamic, but
 instead how dynamic they are.
 
 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.
 
 Makes sense?
 
 Cheers,
 Georgi
 
 --
 Georgi Kobilarov
 www.georgikobilarov.com
 
  -Original Message-
  From: Michael Hausenblas [mailto:michael.hausenb...@deri.org]
  Sent: Friday, November 20, 2009 4:01 PM
  To: Georgi Kobilarov
  Cc: Linked Data community
  Subject: Re: RDF Update Feeds
 
 
  Georgi, All,
 
  I like the discussion, and as it seems to be a recurrent pattern as
  pointed
  out by Yves (which might be a sign that we need to invest some more
  time
  into it) I've tried to sum up a bit and started a straw-man proposal
  for a
  more coarse-grained solution [1].
 
  Looking forward to hearing what you think ...
 
  Cheers,
Michael
 
  [1] http://esw.w3.org/topic/DatasetDynamics
 
  --
  Dr. Michael Hausenblas
  LiDRC - Linked Data Research Centre
  DERI - Digital Enterprise Research Institute
  NUIG - National University of Ireland, Galway
  Ireland, Europe
  Tel. +353 91 495730
  http://linkeddata.deri.ie/
  http://sw-app.org/about.html
 
 
 
   From: Georgi Kobilarov georgi.kobila...@gmx.de
   Date: Tue, 17 Nov 2009 16:45:46 +0100
   To: Linked Data community public-lod@w3.org
   Subject: RDF Update Feeds
   Resent-From: Linked Data community public-lod@w3.org
   Resent-Date: Tue, 17 Nov 2009 15:46:30 +
  
   Hi all,
  
   I'd like to start a discussion about a topic that I think is getting
   increasingly important: RDF update feeds.
  
   The linked data project is starting to move away from releases of
  large data
   dumps towards incremental updates. But how can services consuming rdf
  data
   from linked data sources get notified about changes? Is anyone aware
  of
   activities to standardize such rdf update feeds, or at least aware of
   projects already providing any kind of update feed at all? And
  related to
   that: How do we deal with RDF diffs?
  
   Cheers,
   Georgi
  
   --
   Georgi Kobilarov
   www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-20 Thread Hugh Glaser

Sorry if I have missed something, but...
We currently put things like
changefreqmonthly/changefreq
changefreqdaily/changefreq
changefreqnever/changefreq
in our semantic sitemaps, and these suggestions seem very similar.
Eg
http://dotac.rkbexplorer.com/sitemap.xml
(And I think these frequencies may correspond to normal sitemaps.)
So a naïve approach, if you want RDF, would be to use something very similar
(and simple).
Of course I am probably known for my naivity, which is often misplaced.
Best
Hugh

On 20/11/2009 17:47, Georgi Kobilarov georgi.kobila...@gmx.de wrote:

 Hi Michael,
 
 nice write-up on the wiki! But I think the vocabulary you're proposing is
 too much generally descriptive. Dataset publishers, once offering update
 feeds, should not only tell that/if their datasets are dynamic, but
 instead how dynamic they are.
 
 Could be very simple by expressing: Pull our update-stream once per
 seconds/minute/hour in order to be *enough* up-to-date.
 
 Makes sense?
 
 Cheers,
 Georgi 
 
 --
 Georgi Kobilarov
 www.georgikobilarov.com
 
 -Original Message-
 From: Michael Hausenblas [mailto:michael.hausenb...@deri.org]
 Sent: Friday, November 20, 2009 4:01 PM
 To: Georgi Kobilarov
 Cc: Linked Data community
 Subject: Re: RDF Update Feeds
 
 
 Georgi, All,
 
 I like the discussion, and as it seems to be a recurrent pattern as
 pointed
 out by Yves (which might be a sign that we need to invest some more
 time
 into it) I've tried to sum up a bit and started a straw-man proposal
 for a
 more coarse-grained solution [1].
 
 Looking forward to hearing what you think ...
 
 Cheers,
   Michael
 
 [1] http://esw.w3.org/topic/DatasetDynamics
 
 --
 Dr. Michael Hausenblas
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html
 
 
 
 From: Georgi Kobilarov georgi.kobila...@gmx.de
 Date: Tue, 17 Nov 2009 16:45:46 +0100
 To: Linked Data community public-lod@w3.org
 Subject: RDF Update Feeds
 Resent-From: Linked Data community public-lod@w3.org
 Resent-Date: Tue, 17 Nov 2009 15:46:30 +
 
 Hi all,
 
 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.
 
 The linked data project is starting to move away from releases of
 large data
 dumps towards incremental updates. But how can services consuming rdf
 data
 from linked data sources get notified about changes? Is anyone aware
 of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And
 related to
 that: How do we deal with RDF diffs?
 
 Cheers,
 Georgi
 
 --
 Georgi Kobilarov
 www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-20 Thread Alexandre Passant


Hi,

On 17 Nov 2009, at 15:45, Georgi Kobilarov wrote:


Hi all,

I'd like to start a discussion about a topic that I think is getting
increasingly important: RDF update feeds.

The linked data project is starting to move away from releases of  
large data
dumps towards incremental updates. But how can services consuming  
rdf data

from linked data sources get notified about changes?



What about using RSS feeds (w/ RDF extensions) combined with RSSCloud  
[1] or PubSubHubbub ?


Best,

Alex.

[1] http://rsscloud.org/
[2] http://code.google.com/p/pubsubhubbub/


Is anyone aware of
activities to standardize such rdf update feeds, or at least aware of
projects already providing any kind of update feed at all? And  
related to

that: How do we deal with RDF diffs?

Cheers,
Georgi

--
Georgi Kobilarov
www.georgikobilarov.com





--
Dr. Alexandre Passant
Digital Enterprise Research Institute
National University of Ireland, Galway
:me owl:sameAs http://apassant.net/alex .

Re: RDF Update Feeds

2009-11-20 Thread Nathan

Georgi Kobilarov wrote:
 Hi all,
 
 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.
 
 The linked data project is starting to move away from releases of large data
 dumps towards incremental updates. But how can services consuming rdf data
 from linked data sources get notified about changes? Is anyone aware of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And related to
 that: How do we deal with RDF diffs?

After thinking about this (perhaps a bit naive myself as still new) I
can't see how this is too complex, infact imho all the existing ways of
handling updates for rss, atom etc seem a bit over kill to me.

an update (or changeset as I'm thinking about it) is essentially nothing
more than this triple has been removed and this one has been added -
on a triple level we don't have a update, it's very much the
equivalent of replace; thus an update for a single triple is a case of
remove old triple, insert new one.

and thus, without thinking about technologies, all I can see we are left
with is as simple as:
- s1 p1 o1
+ s2 p2 o2

i guess even something like n3 could be extended to accommodate this:

given the following example

@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix foaf: http://xmlns.com/foaf/0.1/ .
@prefix owl: http://www.w3.org/2002/07/owl# .
@prefix swp: http://semanticweb.org/id/Property-3A .
@prefix swc: http://semanticweb.org/id/Category-3A .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix swivt: http://semantic-mediawiki.org/swivt/1.0# .
@prefix sw: http://semanticweb.org/id/ .

sw:ESWC2010
swp:Title 7th Extended Semantic Web
Conference^^http://www.w3.org/2001/XMLSchema#string ;
rdfs:label ESWC2010 ;
a swc:Conference ;
swp:Event_in_series wiki:ESWC ;
foaf:homepage http://www.eswc2010.org ;
swp:Has_location_city sw:Heraklion ;
swp:Has_location_country sw:Greece ;
swp:Start_date
2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:End_date
2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:Abstract_deadline
2009-12-15T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:Paper_deadline
2009-12-22T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:Notification
2010-02-24T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:Camera_ready_due
2010-03-10T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
rdfs:isDefinedBy
http://semanticweb.org/wiki/Special:ExportRDF/ESWC2010 ;
swivt:page http://semanticweb.org/wiki/ESWC2010 .

one could easily add in an operator prefix to signify inserts and
deletes; in the following example we change the dates of the conference

@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix swp: http://semanticweb.org/id/Property-3A .
@prefix sw: http://semanticweb.org/id/ .

- sw:ESWC2010
swp:Start_date
2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:End_date
2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime .
+ sw:ESWC2010
swp:Start_date
2010-06-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
swp:End_date
2010-06-04T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime .


once you've got the notation or concept down then everything else will
fall in to place; we can create update streams, or release change sets
on X interval, notify by ping, or poll or whatever.

I dare say you could even handle the same thing in rdf itself by having
graph iri on left, making up a quick ontology with say rdfu:add and
rdfu:delete, storing a triple as an xml literal on the right

so: graph_iri rdfu:add rdfpacket .

http://domain.org/mygraph rdfu:add rdf:RDF
xmlns:log=http://www.w3.org/2000/10/swap/log#;
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
xmlns:sw=http://semanticweb.org/id/;
xmlns:swp=http://semanticweb.org/id/Property-3A;

rdf:Description rdf:about=http://semanticweb.org/id/ESWC2010;
sw:Property-3AEnd_date
rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-06-03T00:00:00/sw:Property-3AEnd_date
sw:Property-3AStart_date
rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-05-30T00:00:00/sw:Property-3AStart_date
/rdf:Description
/rdf:RDF^^rdf:XMLLiteral .


as for implementing, if X server were to build up a changeset in this
and release it daily/hourly/incrementally ; and server X could also
consume and handle these change sets, then we'd be about done as far as
i can see?

reminder, i am very new to this so if it's all way off - please disregard.

regards,

nathan

Re: RDF Update Feeds

2009-11-20 Thread Nathan

Nathan wrote:
 Georgi Kobilarov wrote:
 Hi all,

 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.

 The linked data project is starting to move away from releases of large data
 dumps towards incremental updates. But how can services consuming rdf data
 from linked data sources get notified about changes? Is anyone aware of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And related to
 that: How do we deal with RDF diffs?
 
 After thinking about this (perhaps a bit naive myself as still new) I
 can't see how this is too complex, infact imho all the existing ways of
 handling updates for rss, atom etc seem a bit over kill to me.
 
 an update (or changeset as I'm thinking about it) is essentially nothing
 more than this triple has been removed and this one has been added -
 on a triple level we don't have a update, it's very much the
 equivalent of replace; thus an update for a single triple is a case of
 remove old triple, insert new one.
 
 and thus, without thinking about technologies, all I can see we are left
 with is as simple as:
 - s1 p1 o1
 + s2 p2 o2
 
 i guess even something like n3 could be extended to accommodate this:
 
 given the following example
 
 @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
 @prefix foaf: http://xmlns.com/foaf/0.1/ .
 @prefix owl: http://www.w3.org/2002/07/owl# .
 @prefix swp: http://semanticweb.org/id/Property-3A .
 @prefix swc: http://semanticweb.org/id/Category-3A .
 @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
 @prefix swivt: http://semantic-mediawiki.org/swivt/1.0# .
 @prefix sw: http://semanticweb.org/id/ .
 
 sw:ESWC2010
 swp:Title 7th Extended Semantic Web
 Conference^^http://www.w3.org/2001/XMLSchema#string ;
 rdfs:label ESWC2010 ;
 a swc:Conference ;
 swp:Event_in_series wiki:ESWC ;
 foaf:homepage http://www.eswc2010.org ;
 swp:Has_location_city sw:Heraklion ;
 swp:Has_location_country sw:Greece ;
 swp:Start_date
 2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:End_date
 2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:Abstract_deadline
 2009-12-15T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:Paper_deadline
 2009-12-22T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:Notification
 2010-02-24T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:Camera_ready_due
 2010-03-10T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 rdfs:isDefinedBy
 http://semanticweb.org/wiki/Special:ExportRDF/ESWC2010 ;
 swivt:page http://semanticweb.org/wiki/ESWC2010 .
 
 one could easily add in an operator prefix to signify inserts and
 deletes; in the following example we change the dates of the conference
 
 @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
 @prefix swp: http://semanticweb.org/id/Property-3A .
 @prefix sw: http://semanticweb.org/id/ .
 
 - sw:ESWC2010
 swp:Start_date
 2010-05-30T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:End_date
 2010-06-03T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime .
 + sw:ESWC2010
 swp:Start_date
 2010-06-01T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime ;
 swp:End_date
 2010-06-04T00:00:00^^http://www.w3.org/2001/XMLSchema#dateTime .
 
 
 once you've got the notation or concept down then everything else will
 fall in to place; we can create update streams, or release change sets
 on X interval, notify by ping, or poll or whatever.
 
 I dare say you could even handle the same thing in rdf itself by having
 graph iri on left, making up a quick ontology with say rdfu:add and
 rdfu:delete, storing a triple as an xml literal on the right
 
 so: graph_iri rdfu:add rdfpacket .
 
 http://domain.org/mygraph rdfu:add rdf:RDF
 xmlns:log=http://www.w3.org/2000/10/swap/log#;
 xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
 xmlns:sw=http://semanticweb.org/id/;
 xmlns:swp=http://semanticweb.org/id/Property-3A;
 
 rdf:Description rdf:about=http://semanticweb.org/id/ESWC2010;
 sw:Property-3AEnd_date
 rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-06-03T00:00:00/sw:Property-3AEnd_date
 sw:Property-3AStart_date
 rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime;2010-05-30T00:00:00/sw:Property-3AStart_date
 /rdf:Description
 /rdf:RDF^^rdf:XMLLiteral .
 
 
 as for implementing, if X server were to build up a changeset in this
 and release it daily/hourly/incrementally ; and server X could also
 consume and handle these change sets, then we'd be about done as far as
 i can see?
 
 reminder, i am very new to this so if it's all way off - please disregard.
 

sorry it's late, and I forgot to write half the email :-(

as for changeset management; I thought changesets could be published
through http and pulled using If-Modified-Since header.

1: client

Re: RDF Update Feeds

2009-11-18 Thread Niklas Lindström

Hi Nathan!

2009/11/17 Nathan nat...@webr3.org:
 very short non-detailed reply from me!

I appreciate it.

 pub/sub, atom feeds, RDF over XMPP were my initial thoughts on the
 matter last week - essentially triple (update/publish) streams on a
 pub/sub basis, decentralized suitably, [snip]

 then my thoughts switched to the fact that RDF is not XML (or any other
 serialized format) so to keep it non limited I guess the concept would
 need to be specified first then implemented in whatever formats/ways
 people saw fit, as has been the case with RDF.

I agree that the concept should really be format-independent. But I
think it has to be pragmatic and operation-oriented, to avoid never
getting there.

Atom (feed paging and archiving) is basically designed with exactly
this in mind, and it scaled to my use-cases (resources with multiple
representations, plus opt. attachments), while still being simple
enough to work for just RDF updates. The missing piece is the
deleted-entry/tombstone, for which there is thankfully at least an
I-D.

Therefore modelling the approach around these possibilities required a
minimum of invention (none really, just some wording to descibe the
practise), and it seems suited for a wide range of dataset syndication
scenarios (not so much real-time, where XMPP may be relevant).

At least this works very well as long as the datasets can be sensibly
partitioned into documents (contexts/graphs). But this is IMHO is
the best way to manage RDF anyhow (not the least since one can also
leverage simple REST principles for editing; and since
quad-stores/SPARQL-endpoints support named contexts etc).

But I'd gladly discuss the benefit/drawback ratio of this approach in
relation to our and others' scenarios.

(I do think it would be nice to lift the resulting timeline to
proper RDF -- e.g. AtomOwl (plus a Deletion for tombstones, provenance
and logging etc). But these rather complex concepts -- datasources
(dataset vs. collection vs. feed vs. page), timelines (entries are
*events* for the same resource over time), flat resource manifest
concepts, and so on -- require semantic definitions which will
probably continue to be debated for quite some time! Atom can be
leveraged right now. After all, this is a *very* instrumental aspect
for most domains.)


 this subject is probably not something that should be left for long
 though.. my (personal) biggest worry about 'linked data' is that junk
 data will be at an all time high, if not worse, and not nailing this on
 the head early on (as in weeks/months at max) could contribute to the
 mess considerably.

Couldn't agree with you more. A common, direct (and simple enough)
way of syndicating datasets over time would be very beneficial, and
shared practises for that seems to be lacking today.

COURT http://purl.org/net/court is publically much of a strawman
right now, but I would like to flesh it out. Primarily regarding the
use of Atom I've described, but also with details of our
implementation (the swedish legal information system), concerning
collection and storage, proposed validation and URI-minting/verifying
strategies, lifting the timeline for logging etc.

(In what form and where the project's actual source code will be
public remains to be decided (though opensourcing it has always been
the official plan). Time permitting I will push my own work in the
same vein there for reuse and reference. Regardless I trust the
approach to be simple enough to be implementable from reading this
mail-thread alone. ;) )

Best regards,
Niklas Lindström

RDF Update Feeds

2009-11-17 Thread Georgi Kobilarov

Hi all,

I'd like to start a discussion about a topic that I think is getting
increasingly important: RDF update feeds.

The linked data project is starting to move away from releases of large data
dumps towards incremental updates. But how can services consuming rdf data
from linked data sources get notified about changes? Is anyone aware of
activities to standardize such rdf update feeds, or at least aware of
projects already providing any kind of update feed at all? And related to
that: How do we deal with RDF diffs?

Cheers,
Georgi

--
Georgi Kobilarov
www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-17 Thread Toby Inkster

On Tue, 2009-11-17 at 16:45 +0100, Georgi Kobilarov wrote:
 How do we deal with RDF diffs?

Talis' changeset vocab is a good start:

http://n2.talis.com/wiki/Changesets

It has enough level of details for changes to be rewound, replayed, etc.

-- 
Toby A Inkster
mailto:m...@tobyinkster.co.uk
http://tobyinkster.co.uk

Re: RDF Update Feeds

2009-11-17 Thread Damian Steer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Georgi Kobilarov wrote:
 Hi all,
 
 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.
 
 The linked data project is starting to move away from releases of large data
 dumps towards incremental updates. But how can services consuming rdf data
 from linked data sources get notified about changes? Is anyone aware of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And related to
 that: How do we deal with RDF diffs?

There have been a few suggestions over the years. [1] immediately jumps
to mind, for example.

Would sparql update work as a patch format? Generating it might be
tricky, and I'm not sure fancy running syndicated updates without
checking them first. On the other hand I interact with larger stores
using sparql, so simply recording the updates sent would work.

Damian

[1] http://www.w3.org/DesignIssues/Diff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksCyIcACgkQAyLCB+mTtykN+QCg6Z89EqNNBwHcbfi0Aj+8nh2L
YSoAnj4RnZXwU9ycNoKlG2qVyhbLcOYr
=uEMt
-END PGP SIGNATURE-

Re: RDF Update Feeds

2009-11-17 Thread Sören Auer


Damian Steer wrote:

There have been a few suggestions over the years. [1] immediately jumps
to mind, for example.


We integrated functionality for publishing LinkedData updates also in 
Triplify [1]. Its similar to Talis' changeset approach, but works more 
like publishing a hierarchically structured update log as linked data 
itself. Details can be found here:


http://triplify.org/vocabulary/update

Sören

[1] http://triplify.org/

--

--
Sören Auer, AKSW/Computer Science Dept., University of Leipzig
http://www.informatik.uni-leipzig.de/~auer,  Skype: soerenauer

Re: RDF Update Feeds

2009-11-17 Thread Niko Popitsch


Hi,

We are working on this issue with our DSNotify [1] approach. Our 
solution is based on indexing subgraphs of available
LD graphs and deriving feature vectors (FV) for each indexed resource. 
By comparing the sets of newly detected,
recently removed and indexed FVs, we can detect create, remove, update 
and move [2] events in LD sources. These
events are logged and can be accessed via a Java API, an XML-RPC 
interface, and an HTTP interface.


We are also developing a vocabulary (and a corresponding API) that can 
be used to describe so-called eventsets: sets of events that
occurred in a particular data source. This vocab is based on LODE and 
SCOVO and a first draft will be published soon

on our website.

But DSNotify is not ready to index the whole Web of Data. It may rather 
be used as an add-on for particular data providers that
want to keep a high level of link integrity in their data (because the 
reported events may be used by the data provider to update

its hosted data/links).

Other related approaches:
- Triplify's Linked Data Up-date Log [3]
- Silk's Web of Data Link Maintenance Protocol [4]

best regards,
Niko

[1] http://dsnotify.org/
[2] The main purpose of DSNotify is to detect move events in data 
sources, i.e., when resources are published with different identifiers 
(e.g., under a
different HTTP URI). Although this should not be the case theoretically 
(URIs should be cool) it happens quite often in reality, see our paper 
for details.

[3] http://triplify.org/vocabulary/update
[4] http://www4.wiwiss.fu-berlin.de/bizer/silk/wodlmp/

Georgi Kobilarov wrote:

Hi all,

I'd like to start a discussion about a topic that I think is getting
increasingly important: RDF update feeds.

The linked data project is starting to move away from releases of large data
dumps towards incremental updates. But how can services consuming rdf data
from linked data sources get notified about changes? Is anyone aware of
activities to standardize such rdf update feeds, or at least aware of
projects already providing any kind of update feed at all? And related to
that: How do we deal with RDF diffs?

Cheers,
Georgi

--
Georgi Kobilarov
www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-17 Thread Niklas Lindström

 interested in seeing if they'd be interested in something
like COURT as well (since they went for Atom (and RDF) in their
ORA-ORE specs http://www.openarchives.org/ore/..

* You can use Sitemap extensions
http://sw.deri.org/2007/07/sitemapextension/ to expose lists of
archive dumps (e.g. http://products.semweb.bestbuy.com/sitemap.xml),
which could be crawled incrementally. But I don't know how to easily
do deletes without recollecting it all..

* The COURT approach of our system has a rudimentary ping feature
so that sources can notify the collector of updated feeds. This could
of course be improved by using PubSubHubbub
http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.2.html,
but that's currently not a priority for us.


Best regards,
Niklas Lindström

PS. Anyone interested in this COURT approach, *please* contact me; I
am looking for ways to formalize this for easy reuse, not the least
for disseminating government and other open data in a uniform manner.
Both on a specification/recommendation level, and for gathering
implementations (possibly built upon existing frameworks/content
repos/cms:es).



On Tue, Nov 17, 2009 at 4:45 PM, Georgi Kobilarov
georgi.kobila...@gmx.de wrote:
 Hi all,

 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.

 The linked data project is starting to move away from releases of large data
 dumps towards incremental updates. But how can services consuming rdf data
 from linked data sources get notified about changes? Is anyone aware of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And related to
 that: How do we deal with RDF diffs?

 Cheers,
 Georgi

 --
 Georgi Kobilarov
 www.georgikobilarov.com

Re: RDF Update Feeds

2009-11-17 Thread Nathan

 links I can attach more RDF partitioned
 to our needs/restrictions, and just wipe entries if they become to
 large and publish new repartitioned resources carrying RDF.
 
 (In theory this also means that the central system can be replaced
 with a PURL-like redirector, if the agency websites could be deemed
 persistent over time (which they currently cannot).)
 
 
 == Other approaches ==
 
 * Library of Congress have similar Atom feeds and tombstones for their
 subject headings: http://id.loc.gov/authorities/feed/ (paged feeds;
 no explicit archives that I'm aware of, so I'm not sure about the
 collectability of the entire dataset over time -- this can be
 achiveved with regular paging if you're sure you won't drop items when
 climbing as the dataset is updated).
 
 * The ORA-PMH http://www.openarchives.org/pmh/ is an older effort
 with good specifications (though not as RESTful as e.g. Atom, GData
 etc). I'm interested in seeing if they'd be interested in something
 like COURT as well (since they went for Atom (and RDF) in their
 ORA-ORE specs http://www.openarchives.org/ore/..
 
 * You can use Sitemap extensions
 http://sw.deri.org/2007/07/sitemapextension/ to expose lists of
 archive dumps (e.g. http://products.semweb.bestbuy.com/sitemap.xml),
 which could be crawled incrementally. But I don't know how to easily
 do deletes without recollecting it all..
 
 * The COURT approach of our system has a rudimentary ping feature
 so that sources can notify the collector of updated feeds. This could
 of course be improved by using PubSubHubbub
 http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.2.html,
 but that's currently not a priority for us.
 
 
 Best regards,
 Niklas Lindström
 
 PS. Anyone interested in this COURT approach, *please* contact me; I
 am looking for ways to formalize this for easy reuse, not the least
 for disseminating government and other open data in a uniform manner.
 Both on a specification/recommendation level, and for gathering
 implementations (possibly built upon existing frameworks/content
 repos/cms:es).
 
 
 
 On Tue, Nov 17, 2009 at 4:45 PM, Georgi Kobilarov
 georgi.kobila...@gmx.de wrote:
 Hi all,

 I'd like to start a discussion about a topic that I think is getting
 increasingly important: RDF update feeds.

 The linked data project is starting to move away from releases of large data
 dumps towards incremental updates. But how can services consuming rdf data
 from linked data sources get notified about changes? Is anyone aware of
 activities to standardize such rdf update feeds, or at least aware of
 projects already providing any kind of update feed at all? And related to
 that: How do we deal with RDF diffs?

 Cheers,
 Georgi

 --
 Georgi Kobilarov
 www.georgikobilarov.com

55 matches

Mail list logo