Re: [fcrepo-user] ECM validation of MIMETYPE

Stephen Bayliss Fri, 27 Jan 2012 04:21:25 -0800

I've raised https://jira.duraspace.org/browse/FCREPO-1057 based on the
thread so far, any comments please add to the issue.


Thanks
Steve

> -----Original Message-----
> From: Stephen Bayliss [mailto:stephen.bayl...@acuityunlimited.net] 
> Sent: 20 December 2011 07:55
> To: 'Support and info exchange list for Fedora users.'
> Subject: Re: [fcrepo-user] ECM validation of MIMETYPE
> 
> 
> Hi Asger
> 
> Thanks for your analysis of this.
> 
> My thoughts are (based on my use case):
> 
> > Now, how would this help the original problem? Steve wanted
> > to have a more specific specification in the data object than 
> > in the content model. Generally, we would need to create an 
> > inheritance tree for the mime-types. If the content model 
> > requires text/plain, then text/xml should be ok. We will also 
> > need to define alike mime types, such as text/xml and 
> > application/xml. There is a sizable document about this in 
> > http://tools.ietf.org/html/rfc3023
> 
> I wouldn't go as far as defining a tree/hierarchy of 
> mime-types - I think it would be acceptable to list all of 
> the acceptable mime-types in the content model (which can be 
> done in the current design as you point out).
> 
> 
> > 
> > If the content model declared a mimetype parameter, it should
> > be required in the data objects, but the data objects should 
> > be allowed to have additional parameters? Should the content 
> > model have a way to specify that the dataobject should have 
> > the exact mimetype, and not having additional parameters?
> 
> Yes, if the content model declares only a mime-type and no 
> parameters, the data objects must match on the mime-type but 
> may optionally declare parameters (so, content model could 
> specify text/xml, but data objects could specify text/xml 
> plus charset).
> 
> > 
> > 
> > Should we implement the same lenient rules for format_uri?
> > Ie. should the validator understand about how some format 
> > uris can be descendants of each other?
> 
> Probably for now just stay with listing the acceptable ones 
> in the content model (unless people have more complex use cases?)
> > 
> > 
> >   
> > This is just some thoughts on the issue. I do feel that the
> > current design, where you can specify a number of mimetypes 
> > in the content model, and the object is valid if at least one 
> > of them matches, is fine for all usecases I can think of.
> 
> Yes, I think the only addition is separating off the 
> parameters so that
> - if the content model specifies mime-type only, data objects 
> must match on the mime-type, but may specify parameters
> - if the content model specifies mime-type plus parameter x, 
> data objects must match on mime-type, parameter x (and its 
> value), an may optionally have additional parameters
> - I suppose there is the theoretical case where the content 
> model could specify no mime type but a parameter (eg "; 
> charset=utf-8") - but would that be useful in practice?
> 
> Regards
> Steve
> 
> 
> 
> 
> > -----Original Message-----
> > From: Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk]
> > Sent: 19 December 2011 16:47
> > To: fedora-commons-users@lists.sourceforge.net
> > Subject: Re: [fcrepo-user] ECM validation of MIMETYPE
> > 
> > 
> > Okay, here are distilled rules, which we must, at least stay within
> > 
> > Format of a mimetype
> > type/subtype; name1=value1; name2=value2
> > 
> > media-type     = type "/" subtype *( ";" parameter )
> >        type         = token
> >        subtype    = token
> >        parameter         = attribute "=" value
> >              attribute              = token
> >              value                   = token | quoted-string
> > 
> > 
> > 
> > Axioms:
> > the order of the name/value pairs are not important
> > each name/value pair is separated with whitespaces.
> > Tokens cannot contain whitespaces
> > values can be case sensitive
> > All values, except the last, should end in a ;
> > 
> > Okay, the rules for mime-types are a bit more complex than I
> > originally thought. 
> > 
> > 1. Should we implement these rules in the rules for the
> > content model, in order to allow people to validate their 
> > mimetype-specifications? Ie. to avoid having a content model 
> > require that objects used a wrongly formatted mimetype.
> > 
> > Next, to compare if two mime-types are equals.
> > Basically, compare textual until the first ;
> > then split the remainder on ;
> > split each split on =
> > sort on the split names.
> > Compare the two split-lists
> > 
> > Now, how would this help the original problem? Steve wanted
> > to have a more specific specification in the data object than 
> > in the content model. Generally, we would need to create an 
> > inheritance tree for the mime-types. If the content model 
> > requires text/plain, then text/xml should be ok. We will also 
> > need to define alike mime types, such as text/xml and 
> > application/xml. There is a sizable document about this in 
> > http://tools.ietf.org/html/rfc3023
> > 
> > If the content model declared a mimetype parameter, it should
> > be required in the data objects, but the data objects should 
> > be allowed to have additional parameters? Should the content 
> > model have a way to specify that the dataobject should have 
> > the exact mimetype, and not having additional parameters?
> > 
> > 
> > Should we implement the same lenient rules for format_uri?
> > Ie. should the validator understand about how some format 
> > uris can be descendants of each other?
> > 
> > 
> >   
> > This is just some thoughts on the issue. I do feel that the
> > current design, where you can specify a number of mimetypes 
> > in the content model, and the object is valid if at least one 
> > of them matches, is fine for all usecases I can think of.
> > 
> > Regards
> > 
> > 
> > 
> > 
> > 
> > 
> > On 12/19/2011 04:45 PM, Stephen Bayliss wrote:
> > Hi Asger
> > 
> > That's a good point.  Presumably a definition as per
> > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1
> 7 with the media types as per 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7
> 
> So from that the main type has to go first and my guess is 
> that the order of the parameters is not important.  Though I 
> wonder if there are further levels of complexity; eg could 
> charset be utf-8 and UTF-8 and both be equivalent?  (It looks 
> like utf-8 is the canonical form though for text/xml)
> 
> Steve
> -----Original Message-----
> From: Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk] 
> Sent: 19 December 2011 14:29
> To: fedora-commons-users@lists.sourceforge.net
> Subject: Re: [fcrepo-user] ECM validation of MIMETYPE
> 
> 
> Yes, but we would need to specify some rules then.
> 
> charset is not the only subtype allowed, I do believe this is 
> an openended set. I do know people have been using "version" as well.
> 
> So, I would need to know how to split a mime-type and if the 
> order of the subtypes are important?
> 
> Secondly, you can of course specify multiple form statements 
> in the content model, the requirement is just that ONE of 
> them match. So, specify the various allowed charsets, and one 
> without charset, and you should be safe.
> 
> Regards
> 
> On 12/15/2011 01:03 PM, Stephen Bayliss wrote: 
> As far as I can tell, ECM validation of a datastream's 
> MIMETYPE is strict - the entire MIMETYPE property contents 
> have to match that declared in the content model. What about 
> the case where one might want to specify the MIMETYPE of a 
> datastream in the CModel, but not the character set?  If I 
> specify MIMETYPE as "text/xml" in the CModel and as 
> "text/xml; charset=UTF-8" in the object, it fails validation. 
> Would it make sense to only validate charset if it is defined 
> in the content model? Regards Steve
> 
> 
> --------------------------------------------------------------
> ----------------
> Write once. Port to many.
> Get the SDK and tools to simplify cross-platform app 
> development. Create 
> new or port existing apps to sell to consumers worldwide. Explore the 
> Intel AppUpSM program developer opportunity. 
> appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev 
> _______________________________________________
> Fedora-commons-users mailing list 
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> 


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] ECM validation of MIMETYPE

Reply via email to