dcc-associates  

Re: [dcc-associates] expressing certainty in PREMIS

Tim DiLauro
Thu, 15 Jul 2010 06:44:50 -0700

On Jul 14, 2010, at 12:41 PM, David Rosenthal wrote:

> This means that running these tools,  remembering their results,  and
> using those results at a later time is a very bad idea.  If the
> information is needed at a later time,  the tools should be re-run.
> And the information should be used with the knowledge that some of the
> results at any given time will be wrong.

But running these tools repeatedly over large amounts of data is expensive.  
Finding ways to reduce this need would be useful.

One approach to consider would be keeping the *multiple* results of each of the 
various tools rather than (or, perhaps, in addition to) the *unified* result of 
all of them.  Associated with each of these results would be some 
identification of the source (tools, versions of particular formats, etc.).  
These data could be evaluated during preservation evaluation processing by 
looking one level deeper when asking the usual question: "Is this format at 
risk." Instead of stopping there, we could start with the question: "Is this 
version of the data about this format or from this tool invalidated or 
questionable?"  If so, then it should be marked as such and a corrected version 
generated, if possible.  My language here is rather imprecise, but I hope the 
point is coming through.

Perhaps the http://code.google.com/p/fits/ tool -- which I agree should be 
renamed to avoid confusion with FITS *format* -- could be modified to perform 
such a re-evaluation/validation and possibly reuse this already captured data.

~Tim