Priscilla Caplan
Thu, 15 Jul 2010 07:49:57 -0700
PriscillaEither /formatDesignation/ or at least one instance of /formatRegistry/ is required.
The most specific format (or format profile) should be recorded. A repository (or formats registry) may wish to use multipart format names (e.g., "TIFF_GeoTIFF" or "WAVE_MPEG_BWF") to achieve this specificity.
For any given file or bitstream, the most specific format identified by the repository should be recorded. A restricted or modified version of a format is considered more specific than the format; for example, GeoTIFF is more specific than TIFF; BWF is more specific than WAVE.
If a file or bitstream conforms to more than one format of equal specificity, each should be recorded in separate /format/ containers.
On 7/15/2010 9:19 AM, Tim DiLauro wrote:
On Jul 14, 2010, at 12:41 PM, David Rosenthal wrote:This means that running these tools, remembering their results, and using those results at a later time is a very bad idea. If the information is needed at a later time, the tools should be re-run. And the information should be used with the knowledge that some of the results at any given time will be wrong.But running these tools repeatedly over large amounts of data is expensive. Finding ways to reduce this need would be useful. One approach to consider would be keeping the *multiple* results of each of the various tools rather than (or, perhaps, in addition to) the *unified* result of all of them. Associated with each of these results would be some identification of the source (tools, versions of particular formats, etc.). These data could be evaluated during preservation evaluation processing by looking one level deeper when asking the usual question: "Is this format at risk." Instead of stopping there, we could start with the question: "Is this version of the data about this format or from this tool invalidated or questionable?" If so, then it should be marked as such and a corrected version generated, if possible. My language here is rather imprecise, but I hope the point is coming through. Perhaps the http://code.google.com/p/fits/ tool -- which I agree should be renamed to avoid confusion with FITS *format* -- could be modified to perform such a re-evaluation/validation and possibly reuse this already captured data. ~Tim