Okay, here are distilled rules, which we must, at least stay within

Format of a mimetype
type/subtype; name1=value1; name2=value2

media-type     = type "/" subtype *( ";" parameter )
       type         = token
       subtype    = token
       parameter         = attribute "=" value
             attribute              = token
             value                   = token | quoted-string



Axioms:
the order of the name/value pairs are not important
each name/value pair is separated with whitespaces.
Tokens cannot contain whitespaces
values can be case sensitive
All values, except the last, should end in a ;

Okay, the rules for mime-types are a bit more complex than I originally thought.

1. Should we implement these rules in the rules for the content model, in order to allow people to validate their mimetype-specifications? Ie. to avoid having a content model require that objects used a wrongly formatted mimetype.

Next, to compare if two mime-types are equals.
Basically, compare textual until the first ;
then split the remainder on ;
split each split on =
sort on the split names.
Compare the two split-lists

Now, how would this help the original problem? Steve wanted to have a more specific specification in the data object than in the content model. Generally, we would need to create an inheritance tree for the mime-types. If the content model requires text/plain, then text/xml should be ok. We will also need to define alike mime types, such as text/xml and application/xml. There is a sizable document about this in
http://tools.ietf.org/html/rfc3023

If the content model declared a mimetype parameter, it should be required in the data objects, but the data objects should be allowed to have additional parameters? Should the content model have a way to specify that the dataobject should have the exact mimetype, and not having additional parameters?


Should we implement the same lenient rules for format_uri? Ie. should the validator understand about how some format uris can be descendants of each other?



This is just some thoughts on the issue. I do feel that the current design, where you can specify a number of mimetypes in the content model, and the object is valid if at least one of them matches, is fine for all usecases I can think of.

Regards






On 12/19/2011 04:45 PM, Stephen Bayliss wrote:
Hi Asger
That's a good point. Presumably a definition as per http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17 with the media types as per http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7 So from that the main type has to go first and my guess is that the order of the parameters is not important. Though I wonder if there are further levels of complexity; eg could charset be utf-8 and UTF-8 and both be equivalent? (It looks like utf-8 is the canonical form though for text/xml)
Steve

    -----Original Message-----
    *From:* Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk]
    *Sent:* 19 December 2011 14:29
    *To:* fedora-commons-users@lists.sourceforge.net
    *Subject:* Re: [fcrepo-user] ECM validation of MIMETYPE

    Yes, but we would need to specify some rules then.

    charset is not the only subtype allowed, I do believe this is an
    openended set. I do know people have been using "version" as well.

    So, I would need to know how to split a mime-type and if the order
    of the subtypes are important?

    Secondly, you can of course specify multiple form statements in
    the content model, the requirement is just that ONE of them match.
    So, specify the various allowed charsets, and one without charset,
    and you should be safe.

    Regards

    On 12/15/2011 01:03 PM, Stephen Bayliss wrote:

    As far as I can tell, ECM validation of a datastream’s MIMETYPE
    is strict – the entire MIMETYPE property contents have to match
    that declared in the content model.

    What about the case where one might want to specify the MIMETYPE
    of a datastream in the CModel, but not the character set?  If I
    specify MIMETYPE as “text/xml” in the CModel and as “text/xml;
    charset=UTF-8” in the object, it fails validation.

    Would it make sense to only validate charset if it is defined in
    the content model?

    Regards

    Steve



------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to