On 03/10/2006, at 10:16 PM, Elliotte Harold wrote:
Perhaps it would be helpful to back up a step. Rather than starting
with format design let's try to get a list of everything we need to
include in this format. E.g.
1. Complete source text for every blog entry.
2. Metadata for each blog entry
A. Title
B. Catgeory
C. Tags
3. All media referenced by relative URLs from blog entry
4. URL of blog entry
What else?
Well here's my list:
1. Complete list of authors and categories defined
2. For each article:
a. Source text
b. All the relevant metadata from the Atom spec, namely:
author, ID, published, rights, title, updated, summary,
categories
c. Some other metadata:
draft status, syntax of source
d. "Owned" media, whether linked to in the source text or enclosure
3. For each comment or trackback:
a. Source text
b. Atom spec metadata:
author, ID, title, published, summary, avatar?
c. Additional metadata:
pointer to parent article or comment (ie "in-reply-to")
The tricky bit is defining what is meant by "owned" media.
If we assume that an input to this process is a URL, I would say that
"owned" media is any referenced media which resolves to the same
host. This would preclude separately-hosted media (eg
"images.example.com") but I don't see that this can be handled
easily: how would an importer handle media destined for more than one
host?
I don't think it's worthwhile attempting to support arbitary
differences between paths of the exported data and its desired import
location. For example, I wouldn't expect to be able to migrate from
http://example.com/blog to http://example.org/my/cool/blog AND have
all the relative and absolute links to media magically work.
So in a nutshell that's the problem I'm trying to solve.