Re: Metadata in 3.0 [Was: JSPWiki 3 design notes]

Janne Jalkanen Tue, 05 Feb 2008 08:41:21 -0800

I don't think it will. There's a core set of fields but their names
should probably be abstractions. I'm trying to think through how this
might work without loads of problems. There's so many applications
for JSPWiki (in terms of how it might fit into other applications)
that we'll need to fit into others' metadata schemes. What I'm
talking about are really surface names for things.

Yes, it will. If the provider has to figure out mapping betweendifferent concepts in the database, it'll create problems.

This is exactly why namespaces were invented, and this is also why itwould probably be a better idea NOT to reuse Dublin Core, but tostick to our own schema.

Well, yes, but also having the field names match a given schema. Maybe
some kind of transformation feature, dunno.

I think namespaces are quite enough for us. I don't really want tocode for the case in case someone wants to use "wiki:author" for someother purpose.

If people want, they *can* rewrite their own backend in such a waythat in converts everything into paper notes stuck onto a donkeyglued to a wall somewhere in Pakistan with the word "CUCKOO" writtenon the backside - but after the JCR interface, I don't really carewhat transformations you do.

Well, I also mentioned that I really doubt that I'd be usingdc:identifier

for those purposes within the JSPWiki metadata profile. I can also see
creating a suitable ID within our own namespace, but I really think
dc:identifier would suit fine. We'd not be abusing it at all.


Ah yes, now I found it.  From RFC 5013:

<snip>
"Element Name:   identifier

   Label:       Identifier
   Definition:  An unambiguous reference to the resource within a given
                context.
   Comment:     Recommended best practice is to identify the
                resource by means of a string conforming
                to a formal identification system."
</snip>

Whereas from RFC 4287 (Atom)

<snip>
"Its content MUST be an IRI, as defined by [RFC3987].  Note that the
   definition of "IRI" excludes relative references.  Though the IRI

might use a dereferencable scheme, Atom Processors MUST NOTassume it

   can be dereferenced.

   When an Atom Document is relocated, migrated, syndicated,
   republished, exported, or imported, the content of its atom:id
   element MUST NOT change.  Put another way, an atom:id element
   pertains to all instantiations of a particular Atom entry or feed;
   revisions retain the same content in their atom:id elements.  It is
   suggested that the atom:id element be stored along with the
   associated resource.

   The content of an atom:id element MUST be created in a way that
   assures uniqueness.

   Because of the risk of confusion between IRIs that would be
   equivalent if they were mapped to URIs and dereferenced, the
   following normalization strategy SHOULD be applied when generating
   atom:id elements:

   o  Provide the scheme in lowercase characters.
   o  Provide the host, if any, in lowercase characters.
   o  Only perform percent-encoding where it is essential.
   o  Use uppercase A through F characters when percent-encoding.
   o  Prevent dot-segments from appearing in paths.
   o  For schemes that define a default authority, use an empty
      authority if the default is desired.
   o  For schemes that define an empty path to be equivalent to a path
      of "/", use "/".
   o  For schemes that define a port, use an empty port if the default
      is desired.
   o  Preserve empty fragment identifiers and queries.
   o  Ensure that all components of the IRI are appropriately character
      normalized, e.g., by using NFC or NFKC.

4.2.6.1.  Comparing atom:id

Instances of atom:id elements can be compared to determinewhether an

   entry or feed is the same as one seen before.  Processors MUST
   compare atom:id elements on a character-by-character basis (in a
   case-sensitive fashion).  Comparison operations MUST be based solely
   on the IRI character strings and MUST NOT rely on dereferencing the
   IRIs or URIs mapped from them.

   As a result, two IRIs that resolve to the same resource but are not
   character-for-character identical will be considered different for
   the purposes of identifier comparison.

   For example, these are four distinct identifiers, despite the fact
   that they differ only in case:

      http://www.example.org/thing
      http://www.example.org/Thing
      http://www.EXAMPLE.org/thing
      HTTP://www.example.org/thing

   Likewise, these are three distinct identifiers, because IRI
   %-escaping is significant for the purposes of comparison:

      http://www.example.com/~bob
      http://www.example.com/%7ebob
      http://www.example.com/%7Ebob";

</snip>

I like atom:id much more than the dc:identifier, because

a) atom:id conforms to very precise semantics, including comparisonrules (which dc:identifier does not give)b) atom:id is defined as globally unique and non-dereferenceable(which helps a *lot* when you don't get people assuming that there'ssomething at the end of your IRI)c) atom:id is defined as an IRI instead of an URI (small difference,but might be important)d) atom:id is defined as unique across the entire lifespan of theentity, which dc:identifier is not.e) Atom feeds make a lot of sense to use, even in a wiki context (andyou need the atom:id anyway)

Since atom:id is a machine-processable entity, having clear, machine-understandable rules as to what it really is, is very, veryimportant. For dc:identifier, it's pretty much handwaving.

Not that I'm aware of. DC doesn't get into that kind of thing much
except when you get to things like dates.

I would actually like to use the atom:person construct here, since ithas better semantics (it adds an IRI to a name, which can be usefulin figuring out across wikis who actually authored what). But itmight be easier to just to store a local identifier, in which case dcis as good as any.

It certainly suits the role of both dc:creator, editor, translator,
etc. (i.e., very general purpose), anyone who contributes to the
resource.


But again, the definition is a bit handwavy.

Recommendation: Use DCTERMS.format. This is the term used to contain
a format identifier. While I recognise that these discussionstend to
I would need to check if it's okay.
That one is pretty common.

Unfortunately, it just says that the "best practice" is to usesomething like MIME. Now the problem is that in order to considere.g. data portability, there's no way to say that "thisdcterms:format" means a MIME type. So again, a system processing theinformation needs to resort to context-sensitive processing (e.g."ok, so this comes from jspwiki, so it's always a MIME type").Which isn't really very good. This is why I would like to have anunambigous "wiki:contentType" definition, which can also be reflectedin a non-modifiable pseudoproperty "dcterms:format".

E.g. "wiki:contentType contains a STRING, which denotes the MIMEcontent type of the content as defined in RFC XXXX [MIME]."

For example, if it's just defined as a String, how do you defineequivalence rules? Is it okay to put in IMAGE/JPG, or ImAgE/jpG, orimage/jpg? If you do not know that these are MIME types, and RFC XXXXdefines MIME comparison as case-insensitive, then your applicationmight be functioning wrong.

This is really my gripe with Dublin Core - it leaves too much up forinterpretation. Which makes it really good for people, butcumbersome for computers.

It's a Big Deal for a lot of people, I probably don't care mucheither.
I use 'text/wiki' for general purpose wiki text and the application
one above to specifically tag JSPWiki wiki text.


I don't think you can use text/wiki - it's missing the "x-" ;-)

It might be interesting to just adopt the practice other wikienginesare using.


/Janne

Re: Metadata in 3.0 [Was: JSPWiki 3 design notes]

Reply via email to