Re: Metadata in 3.0 [Was: JSPWiki 3 design notes]

Murray Altheim Tue, 05 Feb 2008 05:36:55 -0800

Janne Jalkanen wrote:

Now, before getting into this too deeply it occurs to me that we might
consider a pluggable meta API rather than single metadata schema. There


Um.  Pluggable?  No.  That'll create loads of problems.


I don't think it will. There's a core set of fields but their names
should probably be abstractions. I'm trying to think through how this
might work without loads of problems. There's so many applications
for JSPWiki (in terms of how it might fit into other applications)
that we'll need to fit into others' metadata schemes. What I'm
talking about are really surface names for things.

User-access to the metadata?  Absolutely.   And I think that is what
you really mean - ability to add your own arbitrary metadata for any
Node.


Well, yes, but also having the field names match a given schema. Maybe
some kind of transformation feature, dunno.

WorldCat), Dublin Core is used in almost the entirety of the world's
libraries for lightweight interchangeable metadata and is compatible
with and/or the basis of the designs used by the W3C and its "semantic
web".


Semantic web is actually a load of bollocks.  But it has some nice
ideas.


Oh, you don't need to convince me of that. I'm on public record with
quite a number of people at the W3C for stating pretty much the same
thing, with little extra diplomacy.

I note that many of the proposed field names come from Atom. While this
is perhaps an appropriate usage, Atom is a syndication schema, not a
content repository schema. There's not a huge difference and Atom is in
large parts (semantically) compatible with and influenced by Dublin Core
(e.g., choice of atom:creator). For documents stored in a repository I
believe Dublin Core is likely more appropriate.


There are some reasons why I chose Atom identifiers; I was involved in
its definition (somewhat), and therefore I know some of the reasons
why Atom does not use Dublin Core.  Partly because some of the
definitions were a bit complicated.


Hmm. I find DC pretty simple in general, at least for the majority of
terms we'd use.

Historically, there are two Dublin Core schemas, DC.* and DCTERMS.*.
The original core set (about a dozen) of Dublin Core Metadata Elements
(DC.*) have been grandfathered into the set of DC Terms (DCTERMS, see
footnote). For our purposes below, we can consider DC.* and DCTERMS.*
as identical namespaces (they by definition now are).


I wasn't aware of dcterms.  INteresting.


Mostly DCTERMS moves the qualifiers down into a flattened namespace,
which is simpler, certainly.

 * atom:updated As in RFC 4287. This is a DATE.

Recommendation: Use DCTERMS.modified. [DC.date or DC.date.modified]
DATE.


The semantics of atom:updated and dcterms.modified differ - and I seem
to recall that that difference is minuscule, but actually very
important.  Can't dig up the reference now, will do later.


Would be interested, thanks.

 * atom:published As in RFC 4287. As JSPWiki does not yet support
  "draft" -pages, this is essentially a creation date. NB: This cannot
   be checked from page version #1, because that might be deleted.
   This is a DATE.

Recommendation: Use DCTERMS.created. Agreed: this must be carried
through all revisions since it provides a canonical container for the
origin date of the document. [DC.date or DC.date.created]
DATE.


Probably better.

 * atom:id As in RFC 4287. This has some advantages, and can easily be
   tied to the JCR jcr:uuid. This is a STRING.

Recommendation: Use DCTERMS.identifier. [DC.identifier]
   STRING (URI?)


Nope.  Atom:id is a very, very useful construct.  As you mentioned in
the last email, you probably want to use dc:identifier for your own

purposes.


Well, I also mentioned that I really doubt that I'd be using dc:identifier
for those purposes within the JSPWiki metadata profile. I can also see
creating a suitable ID within our own namespace, but I really think
dc:identifier would suit fine. We'd not be abusing it at all.

Recommendation: Use DCTERMS.creator. The Atom specification seems to borrow
extensively from DC, with atom:author identical with the concept of
DC.creator (they apparently just didn't like the term 'creator' and
changed it to 'author'), but do use 'contributor' in the same manner
(again, paraphrasing the terminology from DC). This will need to occur
in all revisions since we need to maintain the original author ID
regardless of the existence of a given revision. [DC.creator]
STRING.


I seem to recall that dc requires a specific notation for the user
data - which might be incompatible with what we have (essentially the
uid).  It might be useful to provide a pseudo-property dc:creator
which is constructed out of wiki:creator and UserDatabase data.


Not that I'm aware of. DC doesn't get into that kind of thing much
except when you get to things like dates.

Recommendation: Use DCTERMS.contributor.  The idea with DC.creator and
DC.contributor is that the former is the original creator (author) of
a resource, and any subsequent contributions (editing, translation, etc.)
are considered as being done by a 'contributor'. For the original author,
see wiki:creator (DC.creator) above. [DC.contributor]


I am not certain whether dc:contributor is syntactically okay.


It certainly suits the role of both dc:creator, editor, translator,
etc. (i.e., very general purpose), anyone who contributes to the
resource.

Recommendation: Use new application profile wiki:content. Question
as to binary stream? Not STRING?


JPEGs are badly presented as Strings.


Ah, yes.

Recommendation: Use DCTERMS.format. This is the term used to contain
a format identifier.  While I recognise that these discussions tend to


I would need to check if it's okay.


That one is pretty common.

devolve rather quickly, I would highly recommend considering the MIME
or Internet Media Type as "application/*" instead of "text/*", e.g.,
"application/x-wiki+jspwiki". In looking at the history of "text/html"
vs. "application/html" this would suggest that text formats that use
a significant amount of processing to perform rendering generally move
towards being considered more an application than a text format (i.e.,
that while they may be largely human readable they quickly become
indecipherable or largely unreadable in practice when used with plugins
and other complex syntax, e.g., many if not most pages on Wikipedia.
[DC.format] STRING.


I don't really know about this.  I don't care much.


It's a Big Deal for a lot of people, I probably don't care much either.
I use 'text/wiki' for general purpose wiki text and the application
one above to specifically tag JSPWiki wiki text.

Recommended: Use new application profile wiki:state. Enumerated value
set. Not labeled as BOOLEAN but seems to be.


Yes, boolean.

In summary, while I see Atom as interesting and in large part semantically
compatible with Dublin Core, I think it'd be better to incorporate a
schema that was designed more specifically for resources than for feeds;
the definitions fit more closely with our usage.


I think we need to define the exact semantics of the properties we
want to use, and then choose what is most appropriate - or define our own.


Yup.

I'm outa here... gotta run.

Murray

...........................................................................
Murray Altheim <murray07 at altheim.com>                           ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

      Boundless wind and moon - the eye within eyes,
      Inexhaustible heaven and earth - the light beyond light,
      The willow dark, the flower bright - ten thousand houses,
      Knock at any door - there's one who will respond.
                                      -- The Blue Cliff Record

Re: Metadata in 3.0 [Was: JSPWiki 3 design notes]

Reply via email to