Ed Summers wrote:

Thanks for posting this Jakob. I was just reading RFC 5005 on the
train yesterday (literally) and the parallels between it and OAI-PMH
struck me as well. It's not quite clear to me how deleted records
would be handled with an atom archive feed. But I guess one could
assume if the identifier is no longer present it has been deleted it.
But that would require pulling the entire archive... I'm not really
sure how much deletes are really used in OAI-PMH repositories anyhow.

OAI-PMH 1.1 was not clear enough on deletions but in 2.0 the
specification contains an example. I think the missing support of
deletions in data providers has to do with the missing explicit support
in service providers and vice versa (henn-and-egg-problem).

Stuart Weibel has written [1] about the subject of blog archiving in
the past. And I remember hearing Jon Udell and Dan Chudnov talk about
it [2]. Who knows what technorati, bloglines and googlereader are
doing in this area. I guess the reality is that blogs are on the web
and as such will be archived by InternetArchive [3]. But perhaps that
doesn't really fit quite right? That's my feeling.

Thanks. BlogML was new to me - sounds interesting but looks very shaggy
and over-engineered - you do not even get the spec in HTML but have to
download an archive that contains tons of nasty .NET files and an XML
schema instead of a textual description with examples and discussion. I
copied the XML schema here: http://www.gbv.de/wikis/cls/BlogML. I think
extending ATOM is the better way.

I think your general point is correct. Libraries need to be
integrating themselves into the web these days rather than expecting
the web to integrate into them.

I doubt that archiving weblogs is that complicated [1]! You need a
harvester (partly implemented in many Feed-Reader), an archive (you
could start with just saving validated ATOM-Files), an index (Solr?) and
a reader (also already implemented in many Feed-Readers). I bet you
don't need more then a medium size project with one or two developers
and one or two years to create sustainable tools for basic weblog
archiving. Such a project could be done by any larger library or archive
that is able to get funding. It's not a lack of resources, it's a lack
of visions.

Oh, and would it be alright to add your blog to
http://planet.code4lib.org -- we need more of an international
presence on there IMHO.

The subfeed http://jakoblog.de/category/en/feed/atom/ contains all
English language postings which are probably of higher relevance.


[1] Ok, real long-term preservatation *is* complicated but if you only
archive well-formed XML that conforms to a given schema (ATOM, HTML) you
should be in a good position for the next decades.

Jakob Voß <[EMAIL PROTECTED]>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de

Reply via email to