I don't think it will. There's a core set of fields but their names
should probably be abstractions. I'm trying to think through how this
might work without loads of problems. There's so many applications
for JSPWiki (in terms of how it might fit into other applications)
that we'll need to fit into others' metadata schemes. What I'm
talking about are really surface names for things.
Yes, it will. If the provider has to figure out mapping between
different concepts in the database, it'll create problems.
This is exactly why namespaces were invented, and this is also why it
would probably be a better idea NOT to reuse Dublin Core, but to
stick to our own schema.
Well, yes, but also having the field names match a given schema. Maybe
some kind of transformation feature, dunno.
I think namespaces are quite enough for us. I don't really want to
code for the case in case someone wants to use "wiki:author" for some
other purpose.
If people want, they *can* rewrite their own backend in such a way
that in converts everything into paper notes stuck onto a donkey
glued to a wall somewhere in Pakistan with the word "CUCKOO" written
on the backside - but after the JCR interface, I don't really care
what transformations you do.
Well, I also mentioned that I really doubt that I'd be using
dc:identifier
for those purposes within the JSPWiki metadata profile. I can also see
creating a suitable ID within our own namespace, but I really think
dc:identifier would suit fine. We'd not be abusing it at all.
Ah yes, now I found it. From RFC 5013:
<snip>
"Element Name: identifier
Label: Identifier
Definition: An unambiguous reference to the resource within a given
context.
Comment: Recommended best practice is to identify the
resource by means of a string conforming
to a formal identification system."
</snip>
Whereas from RFC 4287 (Atom)
<snip>
"Its content MUST be an IRI, as defined by [RFC3987]. Note that the
definition of "IRI" excludes relative references. Though the IRI
might use a dereferencable scheme, Atom Processors MUST NOT
assume it
can be dereferenced.
When an Atom Document is relocated, migrated, syndicated,
republished, exported, or imported, the content of its atom:id
element MUST NOT change. Put another way, an atom:id element
pertains to all instantiations of a particular Atom entry or feed;
revisions retain the same content in their atom:id elements. It is
suggested that the atom:id element be stored along with the
associated resource.
The content of an atom:id element MUST be created in a way that
assures uniqueness.
Because of the risk of confusion between IRIs that would be
equivalent if they were mapped to URIs and dereferenced, the
following normalization strategy SHOULD be applied when generating
atom:id elements:
o Provide the scheme in lowercase characters.
o Provide the host, if any, in lowercase characters.
o Only perform percent-encoding where it is essential.
o Use uppercase A through F characters when percent-encoding.
o Prevent dot-segments from appearing in paths.
o For schemes that define a default authority, use an empty
authority if the default is desired.
o For schemes that define an empty path to be equivalent to a path
of "/", use "/".
o For schemes that define a port, use an empty port if the default
is desired.
o Preserve empty fragment identifiers and queries.
o Ensure that all components of the IRI are appropriately character
normalized, e.g., by using NFC or NFKC.
4.2.6.1. Comparing atom:id
Instances of atom:id elements can be compared to determine
whether an
entry or feed is the same as one seen before. Processors MUST
compare atom:id elements on a character-by-character basis (in a
case-sensitive fashion). Comparison operations MUST be based solely
on the IRI character strings and MUST NOT rely on dereferencing the
IRIs or URIs mapped from them.
As a result, two IRIs that resolve to the same resource but are not
character-for-character identical will be considered different for
the purposes of identifier comparison.
For example, these are four distinct identifiers, despite the fact
that they differ only in case:
http://www.example.org/thing
http://www.example.org/Thing
http://www.EXAMPLE.org/thing
HTTP://www.example.org/thing
Likewise, these are three distinct identifiers, because IRI
%-escaping is significant for the purposes of comparison:
http://www.example.com/~bob
http://www.example.com/%7ebob
http://www.example.com/%7Ebob"
</snip>
I like atom:id much more than the dc:identifier, because
a) atom:id conforms to very precise semantics, including comparison
rules (which dc:identifier does not give)
b) atom:id is defined as globally unique and non-dereferenceable
(which helps a *lot* when you don't get people assuming that there's
something at the end of your IRI)
c) atom:id is defined as an IRI instead of an URI (small difference,
but might be important)
d) atom:id is defined as unique across the entire lifespan of the
entity, which dc:identifier is not.
e) Atom feeds make a lot of sense to use, even in a wiki context (and
you need the atom:id anyway)
Since atom:id is a machine-processable entity, having clear, machine-
understandable rules as to what it really is, is very, very
important. For dc:identifier, it's pretty much handwaving.
Not that I'm aware of. DC doesn't get into that kind of thing much
except when you get to things like dates.
I would actually like to use the atom:person construct here, since it
has better semantics (it adds an IRI to a name, which can be useful
in figuring out across wikis who actually authored what). But it
might be easier to just to store a local identifier, in which case dc
is as good as any.
It certainly suits the role of both dc:creator, editor, translator,
etc. (i.e., very general purpose), anyone who contributes to the
resource.
But again, the definition is a bit handwavy.
Recommendation: Use DCTERMS.format. This is the term used to contain
a format identifier. While I recognise that these discussions
tend to
I would need to check if it's okay.
That one is pretty common.
Unfortunately, it just says that the "best practice" is to use
something like MIME. Now the problem is that in order to consider
e.g. data portability, there's no way to say that "this
dcterms:format" means a MIME type. So again, a system processing the
information needs to resort to context-sensitive processing (e.g.
"ok, so this comes from jspwiki, so it's always a MIME type").
Which isn't really very good. This is why I would like to have an
unambigous "wiki:contentType" definition, which can also be reflected
in a non-modifiable pseudoproperty "dcterms:format".
E.g. "wiki:contentType contains a STRING, which denotes the MIME
content type of the content as defined in RFC XXXX [MIME]."
For example, if it's just defined as a String, how do you define
equivalence rules? Is it okay to put in IMAGE/JPG, or ImAgE/jpG, or
image/jpg? If you do not know that these are MIME types, and RFC XXXX
defines MIME comparison as case-insensitive, then your application
might be functioning wrong.
This is really my gripe with Dublin Core - it leaves too much up for
interpretation. Which makes it really good for people, but
cumbersome for computers.
It's a Big Deal for a lot of people, I probably don't care much
either.
I use 'text/wiki' for general purpose wiki text and the application
one above to specifically tag JSPWiki wiki text.
I don't think you can use text/wiki - it's missing the "x-" ;-)
It might be interesting to just adopt the practice other wikiengines
are using.
/Janne