Re: [fcrepo-user] ECM validation of MIMETYPE

Stephen Bayliss Mon, 19 Dec 2011 23:54:39 -0800

Hi Asger

Thanks for your analysis of this.


My thoughts are (based on my use case):

> Now, how would this help the original problem? Steve wanted 
> to have a more specific specification in the data object than 
> in the content model. Generally, we would need to create an 
> inheritance tree for the mime-types. If the content model 
> requires text/plain, then text/xml should be ok. We will also 
> need to define alike mime types, such as text/xml and 
> application/xml. There is a sizable document about this in 
> http://tools.ietf.org/html/rfc3023

I wouldn't go as far as defining a tree/hierarchy of mime-types - I think it
would be acceptable to list all of the acceptable mime-types in the content
model (which can be done in the current design as you point out).


> 
> If the content model declared a mimetype parameter, it should 
> be required in the data objects, but the data objects should 
> be allowed to have additional parameters? Should the content 
> model have a way to specify that the dataobject should have 
> the exact mimetype, and not having additional parameters?

Yes, if the content model declares only a mime-type and no parameters, the
data objects must match on the mime-type but may optionally declare
parameters (so, content model could specify text/xml, but data objects could
specify text/xml plus charset).

> 
> 
> Should we implement the same lenient rules for format_uri? 
> Ie. should the validator understand about how some format 
> uris can be descendants of each other?

Probably for now just stay with listing the acceptable ones in the content
model (unless people have more complex use cases?)
> 
> 
>   
> This is just some thoughts on the issue. I do feel that the 
> current design, where you can specify a number of mimetypes 
> in the content model, and the object is valid if at least one 
> of them matches, is fine for all usecases I can think of.

Yes, I think the only addition is separating off the parameters so that
- if the content model specifies mime-type only, data objects must match on
the mime-type, but may specify parameters
- if the content model specifies mime-type plus parameter x, data objects
must match on mime-type, parameter x (and its value), an may optionally have
additional parameters
- I suppose there is the theoretical case where the content model could
specify no mime type but a parameter (eg "; charset=utf-8") - but would that
be useful in practice?

Regards
Steve




> -----Original Message-----
> From: Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk] 
> Sent: 19 December 2011 16:47
> To: fedora-commons-users@lists.sourceforge.net
> Subject: Re: [fcrepo-user] ECM validation of MIMETYPE
> 
> 
> Okay, here are distilled rules, which we must, at least stay within
> 
> Format of a mimetype
> type/subtype; name1=value1; name2=value2
> 
> media-type     = type "/" subtype *( ";" parameter )
>        type         = token
>        subtype    = token
>        parameter         = attribute "=" value
>              attribute              = token
>              value                   = token | quoted-string
> 
> 
> 
> Axioms:
> the order of the name/value pairs are not important
> each name/value pair is separated with whitespaces.
> Tokens cannot contain whitespaces
> values can be case sensitive
> All values, except the last, should end in a ;
> 
> Okay, the rules for mime-types are a bit more complex than I 
> originally thought. 
> 
> 1. Should we implement these rules in the rules for the 
> content model, in order to allow people to validate their 
> mimetype-specifications? Ie. to avoid having a content model 
> require that objects used a wrongly formatted mimetype.
> 
> Next, to compare if two mime-types are equals.
> Basically, compare textual until the first ;
> then split the remainder on ;
> split each split on =
> sort on the split names.
> Compare the two split-lists
> 
> Now, how would this help the original problem? Steve wanted 
> to have a more specific specification in the data object than 
> in the content model. Generally, we would need to create an 
> inheritance tree for the mime-types. If the content model 
> requires text/plain, then text/xml should be ok. We will also 
> need to define alike mime types, such as text/xml and 
> application/xml. There is a sizable document about this in 
> http://tools.ietf.org/html/rfc3023
> 
> If the content model declared a mimetype parameter, it should 
> be required in the data objects, but the data objects should 
> be allowed to have additional parameters? Should the content 
> model have a way to specify that the dataobject should have 
> the exact mimetype, and not having additional parameters?
> 
> 
> Should we implement the same lenient rules for format_uri? 
> Ie. should the validator understand about how some format 
> uris can be descendants of each other?
> 
> 
>   
> This is just some thoughts on the issue. I do feel that the 
> current design, where you can specify a number of mimetypes 
> in the content model, and the object is valid if at least one 
> of them matches, is fine for all usecases I can think of.
> 
> Regards
> 
> 
> 
> 
> 
> 
> On 12/19/2011 04:45 PM, Stephen Bayliss wrote: 
> Hi Asger
> 
> That's a good point.  Presumably a definition as per 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1
7 with the media types as per
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7

So from that the main type has to go first and my guess is that the order of
the parameters is not important.  Though I wonder if there are further
levels of complexity; eg could charset be utf-8 and UTF-8 and both be
equivalent?  (It looks like utf-8 is the canonical form though for text/xml)

Steve
-----Original Message-----
From: Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk] 
Sent: 19 December 2011 14:29
To: fedora-commons-users@lists.sourceforge.net
Subject: Re: [fcrepo-user] ECM validation of MIMETYPE


Yes, but we would need to specify some rules then.

charset is not the only subtype allowed, I do believe this is an openended
set. I do know people have been using "version" as well.

So, I would need to know how to split a mime-type and if the order of the
subtypes are important?

Secondly, you can of course specify multiple form statements in the content
model, the requirement is just that ONE of them match. So, specify the
various allowed charsets, and one without charset, and you should be safe.

Regards

On 12/15/2011 01:03 PM, Stephen Bayliss wrote: 
As far as I can tell, ECM validation of a datastream's MIMETYPE is strict -
the entire MIMETYPE property contents have to match that declared in the
content model. What about the case where one might want to specify the
MIMETYPE of a datastream in the CModel, but not the character set?  If I
specify MIMETYPE as "text/xml" in the CModel and as "text/xml;
charset=UTF-8" in the object, it fails validation. Would it make sense to
only validate charset if it is defined in the content model? Regards Steve


------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] ECM validation of MIMETYPE

Reply via email to