Tim DiLauro
Thu, 15 Jul 2010 06:44:50 -0700
On Jul 14, 2010, at 12:41 PM, David Rosenthal wrote: > This means that running these tools, remembering their results, and > using those results at a later time is a very bad idea. If the > information is needed at a later time, the tools should be re-run. > And the information should be used with the knowledge that some of the > results at any given time will be wrong. But running these tools repeatedly over large amounts of data is expensive. Finding ways to reduce this need would be useful. One approach to consider would be keeping the *multiple* results of each of the various tools rather than (or, perhaps, in addition to) the *unified* result of all of them. Associated with each of these results would be some identification of the source (tools, versions of particular formats, etc.). These data could be evaluated during preservation evaluation processing by looking one level deeper when asking the usual question: "Is this format at risk." Instead of stopping there, we could start with the question: "Is this version of the data about this format or from this tool invalidated or questionable?" If so, then it should be marked as such and a corrected version generated, if possible. My language here is rather imprecise, but I hope the point is coming through. Perhaps the http://code.google.com/p/fits/ tool -- which I agree should be renamed to avoid confusion with FITS *format* -- could be modified to perform such a re-evaluation/validation and possibly reuse this already captured data. ~Tim