Re: [fcrepo-user] ECM validation of MIMETYPE

Asger Askov Blekinge Mon, 19 Dec 2011 08:48:36 -0800

Okay, here are distilled rules, which we must, at least stay within


Format of a mimetype
type/subtype; name1=value1; name2=value2

media-type     = type "/" subtype *( ";" parameter )
       type         = token
       subtype    = token
       parameter         = attribute "=" value
             attribute              = token
             value                   = token | quoted-string



Axioms:
the order of the name/value pairs are not important
each name/value pair is separated with whitespaces.
Tokens cannot contain whitespaces
values can be case sensitive
All values, except the last, should end in a ;

Okay, the rules for mime-types are a bit more complex than I originallythought.

1. Should we implement these rules in the rules for the content model,in order to allow people to validate their mimetype-specifications? Ie.to avoid having a content model require that objects used a wronglyformatted mimetype.


Next, to compare if two mime-types are equals.
Basically, compare textual until the first ;
then split the remainder on ;
split each split on =
sort on the split names.
Compare the two split-lists

Now, how would this help the original problem? Steve wanted to have amore specific specification in the data object than in the contentmodel. Generally, we would need to create an inheritance tree for themime-types.If the content model requires text/plain, then text/xml should be ok. Wewill also need to define alike mime types, such as text/xml andapplication/xml. There is a sizable document about this in

http://tools.ietf.org/html/rfc3023

If the content model declared a mimetype parameter, it should berequired in the data objects, but the data objects should be allowed tohave additional parameters?Should the content model have a way to specify that the dataobjectshould have the exact mimetype, and not having additional parameters?

Should we implement the same lenient rules for format_uri? Ie. shouldthe validator understand about how some format uris can be descendantsof each other?

This is just some thoughts on the issue. I do feel that the currentdesign, where you can specify a number of mimetypes in the contentmodel, and the object is valid if at least one of them matches, is finefor all usecases I can think of.


Regards






On 12/19/2011 04:45 PM, Stephen Bayliss wrote:

Hi Asger

That's a good point. Presumably a definition as perhttp://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17 withthe media types as perhttp://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7So from that the main type has to go first and my guess is that theorder of the parameters is not important. Though I wonder if thereare further levels of complexity; eg could charset be utf-8 and UTF-8and both be equivalent? (It looks like utf-8 is the canonical formthough for text/xml)

Steve

    -----Original Message-----
    *From:* Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk]
    *Sent:* 19 December 2011 14:29
    *To:* fedora-commons-users@lists.sourceforge.net
    *Subject:* Re: [fcrepo-user] ECM validation of MIMETYPE

    Yes, but we would need to specify some rules then.

    charset is not the only subtype allowed, I do believe this is an
    openended set. I do know people have been using "version" as well.

    So, I would need to know how to split a mime-type and if the order
    of the subtypes are important?

    Secondly, you can of course specify multiple form statements in
    the content model, the requirement is just that ONE of them match.
    So, specify the various allowed charsets, and one without charset,
    and you should be safe.

    Regards

    On 12/15/2011 01:03 PM, Stephen Bayliss wrote:


    As far as I can tell, ECM validation of a datastream’s MIMETYPE
    is strict – the entire MIMETYPE property contents have to match
    that declared in the content model.

    What about the case where one might want to specify the MIMETYPE
    of a datastream in the CModel, but not the character set?  If I
    specify MIMETYPE as “text/xml” in the CModel and as “text/xml;
    charset=UTF-8” in the object, it fails validation.

    Would it make sense to only validate charset if it is defined in
    the content model?

    Regards

    Steve

------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure

_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] ECM validation of MIMETYPE

Reply via email to