I've raised https://jira.duraspace.org/browse/FCREPO-1057 based on the thread so far, any comments please add to the issue.
Thanks Steve > -----Original Message----- > From: Stephen Bayliss [mailto:stephen.bayl...@acuityunlimited.net] > Sent: 20 December 2011 07:55 > To: 'Support and info exchange list for Fedora users.' > Subject: Re: [fcrepo-user] ECM validation of MIMETYPE > > > Hi Asger > > Thanks for your analysis of this. > > My thoughts are (based on my use case): > > > Now, how would this help the original problem? Steve wanted > > to have a more specific specification in the data object than > > in the content model. Generally, we would need to create an > > inheritance tree for the mime-types. If the content model > > requires text/plain, then text/xml should be ok. We will also > > need to define alike mime types, such as text/xml and > > application/xml. There is a sizable document about this in > > http://tools.ietf.org/html/rfc3023 > > I wouldn't go as far as defining a tree/hierarchy of > mime-types - I think it would be acceptable to list all of > the acceptable mime-types in the content model (which can be > done in the current design as you point out). > > > > > > If the content model declared a mimetype parameter, it should > > be required in the data objects, but the data objects should > > be allowed to have additional parameters? Should the content > > model have a way to specify that the dataobject should have > > the exact mimetype, and not having additional parameters? > > Yes, if the content model declares only a mime-type and no > parameters, the data objects must match on the mime-type but > may optionally declare parameters (so, content model could > specify text/xml, but data objects could specify text/xml > plus charset). > > > > > > > Should we implement the same lenient rules for format_uri? > > Ie. should the validator understand about how some format > > uris can be descendants of each other? > > Probably for now just stay with listing the acceptable ones > in the content model (unless people have more complex use cases?) > > > > > > > > This is just some thoughts on the issue. I do feel that the > > current design, where you can specify a number of mimetypes > > in the content model, and the object is valid if at least one > > of them matches, is fine for all usecases I can think of. > > Yes, I think the only addition is separating off the > parameters so that > - if the content model specifies mime-type only, data objects > must match on the mime-type, but may specify parameters > - if the content model specifies mime-type plus parameter x, > data objects must match on mime-type, parameter x (and its > value), an may optionally have additional parameters > - I suppose there is the theoretical case where the content > model could specify no mime type but a parameter (eg "; > charset=utf-8") - but would that be useful in practice? > > Regards > Steve > > > > > > -----Original Message----- > > From: Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk] > > Sent: 19 December 2011 16:47 > > To: fedora-commons-users@lists.sourceforge.net > > Subject: Re: [fcrepo-user] ECM validation of MIMETYPE > > > > > > Okay, here are distilled rules, which we must, at least stay within > > > > Format of a mimetype > > type/subtype; name1=value1; name2=value2 > > > > media-type = type "/" subtype *( ";" parameter ) > > type = token > > subtype = token > > parameter = attribute "=" value > > attribute = token > > value = token | quoted-string > > > > > > > > Axioms: > > the order of the name/value pairs are not important > > each name/value pair is separated with whitespaces. > > Tokens cannot contain whitespaces > > values can be case sensitive > > All values, except the last, should end in a ; > > > > Okay, the rules for mime-types are a bit more complex than I > > originally thought. > > > > 1. Should we implement these rules in the rules for the > > content model, in order to allow people to validate their > > mimetype-specifications? Ie. to avoid having a content model > > require that objects used a wrongly formatted mimetype. > > > > Next, to compare if two mime-types are equals. > > Basically, compare textual until the first ; > > then split the remainder on ; > > split each split on = > > sort on the split names. > > Compare the two split-lists > > > > Now, how would this help the original problem? Steve wanted > > to have a more specific specification in the data object than > > in the content model. Generally, we would need to create an > > inheritance tree for the mime-types. If the content model > > requires text/plain, then text/xml should be ok. We will also > > need to define alike mime types, such as text/xml and > > application/xml. There is a sizable document about this in > > http://tools.ietf.org/html/rfc3023 > > > > If the content model declared a mimetype parameter, it should > > be required in the data objects, but the data objects should > > be allowed to have additional parameters? Should the content > > model have a way to specify that the dataobject should have > > the exact mimetype, and not having additional parameters? > > > > > > Should we implement the same lenient rules for format_uri? > > Ie. should the validator understand about how some format > > uris can be descendants of each other? > > > > > > > > This is just some thoughts on the issue. I do feel that the > > current design, where you can specify a number of mimetypes > > in the content model, and the object is valid if at least one > > of them matches, is fine for all usecases I can think of. > > > > Regards > > > > > > > > > > > > > > On 12/19/2011 04:45 PM, Stephen Bayliss wrote: > > Hi Asger > > > > That's a good point. Presumably a definition as per > > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1 > 7 with the media types as per > http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7 > > So from that the main type has to go first and my guess is > that the order of the parameters is not important. Though I > wonder if there are further levels of complexity; eg could > charset be utf-8 and UTF-8 and both be equivalent? (It looks > like utf-8 is the canonical form though for text/xml) > > Steve > -----Original Message----- > From: Asger Askov Blekinge [mailto:a...@statsbiblioteket.dk] > Sent: 19 December 2011 14:29 > To: fedora-commons-users@lists.sourceforge.net > Subject: Re: [fcrepo-user] ECM validation of MIMETYPE > > > Yes, but we would need to specify some rules then. > > charset is not the only subtype allowed, I do believe this is > an openended set. I do know people have been using "version" as well. > > So, I would need to know how to split a mime-type and if the > order of the subtypes are important? > > Secondly, you can of course specify multiple form statements > in the content model, the requirement is just that ONE of > them match. So, specify the various allowed charsets, and one > without charset, and you should be safe. > > Regards > > On 12/15/2011 01:03 PM, Stephen Bayliss wrote: > As far as I can tell, ECM validation of a datastream's > MIMETYPE is strict - the entire MIMETYPE property contents > have to match that declared in the content model. What about > the case where one might want to specify the MIMETYPE of a > datastream in the CModel, but not the character set? If I > specify MIMETYPE as "text/xml" in the CModel and as > "text/xml; charset=UTF-8" in the object, it fails validation. > Would it make sense to only validate charset if it is defined > in the content model? Regards Steve > > > -------------------------------------------------------------- > ---------------- > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app > development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. > appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users