This conversation about Atom is, I think, really an important one to have.
As well designed and thought out as protocols & standards such as OAI-PMH,
METS (and the budding OAI-ORE spec) are, they don't have that "viral"
technology attribute of utter simplicity.  Sure there are trade-offs, but
the tool support and interoperability on a much larger scale that Atom
could provide cannot be denied.  I, too, have pondered the possibility of
Atom (& AtomPub for "writing back") as a simpler replacement for all sorts
of similar technologies (METS, OAI-PMH, WebDAV, etc.) --
The simple fact that Google has standardized all of its web services on
GData (a "flavor" of Atom) cannot be ignored.

I have had some very interesting discussions over on atom-syntax about
thoroughly integrating Atom as a standard piece of infrastructure in a
large digital library project here at UT Austin (, and
while I don't necessarily think it provide a whole lot of benefit as an
internal data transfer mechanism, I see numerous advantages to
standardizing on Atom for any number of outward-facing
services/end-points. I think it would be sad if Atom and AtomPub were seen
only as technologies used by and for blogs/blogging.

Also, re: blog mirroring, I highly recommend the current discussions
floating aroung the blogosphere regarding distributed source control (Git,
Mercurial, etc.).  It's a fundamental paradigm shift from centralized
control to distributed control that points the way toward the future of
libraries as they (we) become less and less the gatekeepers for the
"stuff" be it digital or physical and more and more the facilitators of
the "bidirectional replication" that assures ubiquitous access and
long-term preservation.  The library becomes (actually it has already
happended) simply a node on a network of trust and should act accordingly.

See the thoroughly entertaining/thought-provoking Google tech talk by
Linus Torvalds on Git:

-peter keane

On Tue, 23 Oct 2007, Jakob Voss wrote:

Hi Ed,

You wrote:

I completely agree.  When developing software it's really important to
focus on the cleanest/clearest solution, rather than getting bogged
down in edge cases and the comments from nay sayers. I hope that my
response didn't come across that way.


A couple follow on questions for you:

In your vision for this software are you expecting that content
providers would have to implement RFC 5005 for your archiving system
to work?

Probably yes - at least for older entries. New posts can also be
collected with the default feeds. Instead of working out exceptions and
special solutions how to get blog archives with other methods you should
provide RFC 5005 plugins for common blog software like Wordpress and
advertise its use ("We are sorry - the blog that you asked to archive
does not support RFC 5005 so we can only archive new postings. Please
ask its provider to implement archived feeds so we can archive the
postings before {TIMESTAMP}. More information and plugins for RFC 5005
can be found {HERE}. Thank you!").

Are you considering archiving media files associated with a blog entry
(images, sound, video, etc?).

Well, it depends on. There are hundreds of ways to associate media files
- I doubt that you can easily archive YouTube and SlideShare widgets
etc. but images included with <img src="..."/> should be doable. However
I prefer iterative developement - if basic archiving works, you can
start to think about media files. By the way I would value more the
comments - which are also additional and non trivial to archive.

To begin with, a WordPress plugin is surely the right step. Up to now
RFC 5005 is so new that noone implemented it yet although its not


Jakob Voß <[EMAIL PROTECTED]>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242,

Reply via email to