Ed Summers wrote:
Thanks for posting this Jakob. I was just reading RFC 5005 on the train yesterday (literally) and the parallels between it and OAI-PMH struck me as well. It's not quite clear to me how deleted records would be handled with an atom archive feed. But I guess one could assume if the identifier is no longer present it has been deleted it. But that would require pulling the entire archive... I'm not really sure how much deletes are really used in OAI-PMH repositories anyhow.
OAI-PMH 1.1 was not clear enough on deletions but in 2.0 the specification contains an example. I think the missing support of deletions in data providers has to do with the missing explicit support in service providers and vice versa (henn-and-egg-problem).
Stuart Weibel has written [1] about the subject of blog archiving in the past. And I remember hearing Jon Udell and Dan Chudnov talk about it [2]. Who knows what technorati, bloglines and googlereader are doing in this area. I guess the reality is that blogs are on the web and as such will be archived by InternetArchive [3]. But perhaps that doesn't really fit quite right? That's my feeling.
Thanks. BlogML was new to me - sounds interesting but looks very shaggy and over-engineered - you do not even get the spec in HTML but have to download an archive that contains tons of nasty .NET files and an XML schema instead of a textual description with examples and discussion. I copied the XML schema here: http://www.gbv.de/wikis/cls/BlogML. I think extending ATOM is the better way.
I think your general point is correct. Libraries need to be integrating themselves into the web these days rather than expecting the web to integrate into them.
I doubt that archiving weblogs is that complicated [1]! You need a harvester (partly implemented in many Feed-Reader), an archive (you could start with just saving validated ATOM-Files), an index (Solr?) and a reader (also already implemented in many Feed-Readers). I bet you don't need more then a medium size project with one or two developers and one or two years to create sustainable tools for basic weblog archiving. Such a project could be done by any larger library or archive that is able to get funding. It's not a lack of resources, it's a lack of visions.
Oh, and would it be alright to add your blog to http://planet.code4lib.org -- we need more of an international presence on there IMHO.
The subfeed http://jakoblog.de/category/en/feed/atom/ contains all English language postings which are probably of higher relevance. Jakob [1] Ok, real long-term preservatation *is* complicated but if you only archive well-formed XML that conforms to a given schema (ATOM, HTML) you should be in a good position for the next decades. -- Jakob Voß <[EMAIL PROTECTED]>, skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de
