> Now, before getting into this too deeply it occurs to me that we might > consider a pluggable meta API rather than single metadata schema. There
Um. Pluggable? No. That'll create loads of problems. User-access to the metadata? Absolutely. And I think that is what you really mean - ability to add your own arbitrary metadata for any Node. > WorldCat), Dublin Core is used in almost the entirety of the world's > libraries for lightweight interchangeable metadata and is compatible > with and/or the basis of the designs used by the W3C and its "semantic > web". Semantic web is actually a load of bollocks. But it has some nice ideas. > I note that many of the proposed field names come from Atom. While this > is perhaps an appropriate usage, Atom is a syndication schema, not a > content repository schema. There's not a huge difference and Atom is in > large parts (semantically) compatible with and influenced by Dublin Core > (e.g., choice of atom:creator). For documents stored in a repository I > believe Dublin Core is likely more appropriate. There are some reasons why I chose Atom identifiers; I was involved in its definition (somewhat), and therefore I know some of the reasons why Atom does not use Dublin Core. Partly because some of the definitions were a bit complicated. > Historically, there are two Dublin Core schemas, DC.* and DCTERMS.*. > The original core set (about a dozen) of Dublin Core Metadata Elements > (DC.*) have been grandfathered into the set of DC Terms (DCTERMS, see > footnote). For our purposes below, we can consider DC.* and DCTERMS.* > as identical namespaces (they by definition now are). I wasn't aware of dcterms. INteresting. > * atom:updated As in RFC 4287. This is a DATE. > > Recommendation: Use DCTERMS.modified. [DC.date or DC.date.modified] > DATE. The semantics of atom:updated and dcterms.modified differ - and I seem to recall that that difference is minuscule, but actually very important. Can't dig up the reference now, will do later. > * atom:published As in RFC 4287. As JSPWiki does not yet support > "draft" -pages, this is essentially a creation date. NB: This cannot > be checked from page version #1, because that might be deleted. > This is a DATE. > > Recommendation: Use DCTERMS.created. Agreed: this must be carried > through all revisions since it provides a canonical container for the > origin date of the document. [DC.date or DC.date.created] > DATE. Probably better. > * atom:id As in RFC 4287. This has some advantages, and can easily be > tied to the JCR jcr:uuid. This is a STRING. > > Recommendation: Use DCTERMS.identifier. [DC.identifier] > STRING (URI?) Nope. Atom:id is a very, very useful construct. As you mentioned in the last email, you probably want to use dc:identifier for your own purposes. > Recommendation: Use DCTERMS.creator. The Atom specification seems to borrow > extensively from DC, with atom:author identical with the concept of > DC.creator (they apparently just didn't like the term 'creator' and > changed it to 'author'), but do use 'contributor' in the same manner > (again, paraphrasing the terminology from DC). This will need to occur > in all revisions since we need to maintain the original author ID > regardless of the existence of a given revision. [DC.creator] > STRING. I seem to recall that dc requires a specific notation for the user data - which might be incompatible with what we have (essentially the uid). It might be useful to provide a pseudo-property dc:creator which is constructed out of wiki:creator and UserDatabase data. > Recommendation: Use DCTERMS.contributor. The idea with DC.creator and > DC.contributor is that the former is the original creator (author) of > a resource, and any subsequent contributions (editing, translation, etc.) > are considered as being done by a 'contributor'. For the original author, > see wiki:creator (DC.creator) above. [DC.contributor] I am not certain whether dc:contributor is syntactically okay. > Recommendation: Use new application profile wiki:content. Question > as to binary stream? Not STRING? JPEGs are badly presented as Strings. > Recommendation: Use DCTERMS.format. This is the term used to contain > a format identifier. While I recognise that these discussions tend to I would need to check if it's okay. > devolve rather quickly, I would highly recommend considering the MIME > or Internet Media Type as "application/*" instead of "text/*", e.g., > "application/x-wiki+jspwiki". In looking at the history of "text/html" > vs. "application/html" this would suggest that text formats that use > a significant amount of processing to perform rendering generally move > towards being considered more an application than a text format (i.e., > that while they may be largely human readable they quickly become > indecipherable or largely unreadable in practice when used with plugins > and other complex syntax, e.g., many if not most pages on Wikipedia. > [DC.format] STRING. I don't really know about this. I don't care much. > Recommended: Use new application profile wiki:state. Enumerated value > set. Not labeled as BOOLEAN but seems to be. Yes, boolean. > In summary, while I see Atom as interesting and in large part semantically > compatible with Dublin Core, I think it'd be better to incorporate a > schema that was designed more specifically for resources than for feeds; > the definitions fit more closely with our usage. I think we need to define the exact semantics of the properties we want to use, and then choose what is most appropriate - or define our own. I'll need to check dcterms, though. /Janne
