I don't think it will. There's a core set of fields but their names
should probably be abstractions. I'm trying to think through how this
might work without loads of problems. There's so many applications
for JSPWiki (in terms of how it might fit into other applications)
that we'll need to fit into others' metadata schemes. What I'm
talking about are really surface names for things.

Yes, it will. If the provider has to figure out mapping between different concepts in the database, it'll create problems.

This is exactly why namespaces were invented, and this is also why it would probably be a better idea NOT to reuse Dublin Core, but to stick to our own schema.

Well, yes, but also having the field names match a given schema. Maybe
some kind of transformation feature, dunno.

I think namespaces are quite enough for us. I don't really want to code for the case in case someone wants to use "wiki:author" for some other purpose.

If people want, they *can* rewrite their own backend in such a way that in converts everything into paper notes stuck onto a donkey glued to a wall somewhere in Pakistan with the word "CUCKOO" written on the backside - but after the JCR interface, I don't really care what transformations you do.

Well, I also mentioned that I really doubt that I'd be using dc:identifier
for those purposes within the JSPWiki metadata profile. I can also see
creating a suitable ID within our own namespace, but I really think
dc:identifier would suit fine. We'd not be abusing it at all.

Ah yes, now I found it.  From RFC 5013:

<snip>
"Element Name:   identifier

   Label:       Identifier
   Definition:  An unambiguous reference to the resource within a given
                context.
   Comment:     Recommended best practice is to identify the
                resource by means of a string conforming
                to a formal identification system."
</snip>

Whereas from RFC 4287 (Atom)

<snip>
"Its content MUST be an IRI, as defined by [RFC3987].  Note that the
   definition of "IRI" excludes relative references.  Though the IRI
might use a dereferencable scheme, Atom Processors MUST NOT assume it
   can be dereferenced.

   When an Atom Document is relocated, migrated, syndicated,
   republished, exported, or imported, the content of its atom:id
   element MUST NOT change.  Put another way, an atom:id element
   pertains to all instantiations of a particular Atom entry or feed;
   revisions retain the same content in their atom:id elements.  It is
   suggested that the atom:id element be stored along with the
   associated resource.

   The content of an atom:id element MUST be created in a way that
   assures uniqueness.

   Because of the risk of confusion between IRIs that would be
   equivalent if they were mapped to URIs and dereferenced, the
   following normalization strategy SHOULD be applied when generating
   atom:id elements:

   o  Provide the scheme in lowercase characters.
   o  Provide the host, if any, in lowercase characters.
   o  Only perform percent-encoding where it is essential.
   o  Use uppercase A through F characters when percent-encoding.
   o  Prevent dot-segments from appearing in paths.
   o  For schemes that define a default authority, use an empty
      authority if the default is desired.
   o  For schemes that define an empty path to be equivalent to a path
      of "/", use "/".
   o  For schemes that define a port, use an empty port if the default
      is desired.
   o  Preserve empty fragment identifiers and queries.
   o  Ensure that all components of the IRI are appropriately character
      normalized, e.g., by using NFC or NFKC.

4.2.6.1.  Comparing atom:id

Instances of atom:id elements can be compared to determine whether an
   entry or feed is the same as one seen before.  Processors MUST
   compare atom:id elements on a character-by-character basis (in a
   case-sensitive fashion).  Comparison operations MUST be based solely
   on the IRI character strings and MUST NOT rely on dereferencing the
   IRIs or URIs mapped from them.

   As a result, two IRIs that resolve to the same resource but are not
   character-for-character identical will be considered different for
   the purposes of identifier comparison.

   For example, these are four distinct identifiers, despite the fact
   that they differ only in case:

      http://www.example.org/thing
      http://www.example.org/Thing
      http://www.EXAMPLE.org/thing
      HTTP://www.example.org/thing

   Likewise, these are three distinct identifiers, because IRI
   %-escaping is significant for the purposes of comparison:

      http://www.example.com/~bob
      http://www.example.com/%7ebob
      http://www.example.com/%7Ebob";

</snip>

I like atom:id much more than the dc:identifier, because
a) atom:id conforms to very precise semantics, including comparison rules (which dc:identifier does not give) b) atom:id is defined as globally unique and non-dereferenceable (which helps a *lot* when you don't get people assuming that there's something at the end of your IRI) c) atom:id is defined as an IRI instead of an URI (small difference, but might be important) d) atom:id is defined as unique across the entire lifespan of the entity, which dc:identifier is not. e) Atom feeds make a lot of sense to use, even in a wiki context (and you need the atom:id anyway)

Since atom:id is a machine-processable entity, having clear, machine- understandable rules as to what it really is, is very, very important. For dc:identifier, it's pretty much handwaving.

Not that I'm aware of. DC doesn't get into that kind of thing much
except when you get to things like dates.

I would actually like to use the atom:person construct here, since it has better semantics (it adds an IRI to a name, which can be useful in figuring out across wikis who actually authored what). But it might be easier to just to store a local identifier, in which case dc is as good as any.

It certainly suits the role of both dc:creator, editor, translator,
etc. (i.e., very general purpose), anyone who contributes to the
resource.

But again, the definition is a bit handwavy.

Recommendation: Use DCTERMS.format. This is the term used to contain
a format identifier. While I recognise that these discussions tend to
I would need to check if it's okay.

That one is pretty common.

Unfortunately, it just says that the "best practice" is to use something like MIME. Now the problem is that in order to consider e.g. data portability, there's no way to say that "this dcterms:format" means a MIME type. So again, a system processing the information needs to resort to context-sensitive processing (e.g. "ok, so this comes from jspwiki, so it's always a MIME type"). Which isn't really very good. This is why I would like to have an unambigous "wiki:contentType" definition, which can also be reflected in a non-modifiable pseudoproperty "dcterms:format".

E.g. "wiki:contentType contains a STRING, which denotes the MIME content type of the content as defined in RFC XXXX [MIME]."

For example, if it's just defined as a String, how do you define equivalence rules? Is it okay to put in IMAGE/JPG, or ImAgE/jpG, or image/jpg? If you do not know that these are MIME types, and RFC XXXX defines MIME comparison as case-insensitive, then your application might be functioning wrong.

This is really my gripe with Dublin Core - it leaves too much up for interpretation. Which makes it really good for people, but cumbersome for computers.

It's a Big Deal for a lot of people, I probably don't care much either.
I use 'text/wiki' for general purpose wiki text and the application
one above to specifically tag JSPWiki wiki text.

I don't think you can use text/wiki - it's missing the "x-" ;-)

It might be interesting to just adopt the practice other wikiengines are using.

/Janne

Reply via email to