> What'd like to do is just validate that the extension of the bitstreams > submitted during an item deposit correspond to the actual format(s) of the > files using the output of jhove. Now, I don't know how difficult this will > be, and if you are working on it, and it's going to be coming out in 6 > months, then perhaps my time would be better spent doing something else. > > So, what do you think about this. Do you see this sort of functionality > realistically being available in the near future ( version 1.5, perhaps? ), > or is what I want to do not that difficult, and only a small subset of what > you're working on, so why not just do it?
The work I'm doing might help somewhat, since it will include more sophisticated and accurate format identification, plus a measurement of confidence in the identification -- but I can't make any promises whether it will get into 1.5. Full details should be available on the wiki within a few weeks, I'll anounce it on the dspace-tech and dspace-devel lists so the community can comment on my proposal. It's a whole framework for integrating external data format registries (like the GDFR), as well as format-identifying applications. It does not include format validators but they do have a place in the overall design. I'm not sure JHOVE version 1 will be much help, either -- I recommend taking a hard look at its limitations before spending any time on it. Its repertoire of formats is somewhat limited, and the output is not trivial to interpret. Also, we found it gave a significant number of false negatives when validating. What's left? Tools like DROID (droid.sourceforge.net), perhaps, although it has some difficulty integrating with DSpace. It just identifies formats, but you could use that as a quasi-validation. Also, the field of data format representation, identification, and validation is in great flux right now, so there will be improvements. Thats's why I'm designing a very flexible framework to let DSpace make use of external resources. We heard the JHOVE 2 project just got funded, so that will be worth watching. See http://fileformats.blogspot.com/search/label/JHOVE for some early hints. -- Larry ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech