On 03/10/2006, at 10:16 PM, Elliotte Harold wrote:

Perhaps it would be helpful to back up a step. Rather than starting with format design let's try to get a list of everything we need to include in this format. E.g.

1. Complete source text for every blog entry.
2. Metadata for each blog entry
     A. Title
     B. Catgeory
     C. Tags
3. All media referenced by relative URLs from blog entry
4. URL of blog entry

What else?

Well here's my list:

1. Complete list of authors and categories defined
2. For each article:
        a. Source text
        b. All the relevant metadata from the Atom spec, namely:
                author, ID, published, rights, title, updated, summary, 
categories
        c. Some other metadata:
                draft status, syntax of source
        d. "Owned" media, whether linked to in the source text or enclosure
3. For each comment or trackback:
        a. Source text
        b. Atom spec metadata:
                author, ID, title, published, summary, avatar?
        c. Additional metadata:
                pointer to parent article or comment (ie "in-reply-to")

The tricky bit is defining what is meant by "owned" media.

If we assume that an input to this process is a URL, I would say that "owned" media is any referenced media which resolves to the same host. This would preclude separately-hosted media (eg "images.example.com") but I don't see that this can be handled easily: how would an importer handle media destined for more than one host?

I don't think it's worthwhile attempting to support arbitary differences between paths of the exported data and its desired import location. For example, I wouldn't expect to be able to migrate from http://example.com/blog to http://example.org/my/cool/blog AND have all the relative and absolute links to media magically work.

So in a nutshell that's the problem I'm trying to solve.


Reply via email to