> What'd like to do is just validate that the extension of the bitstreams
> submitted during an item deposit correspond to the actual format(s) of the
> files using the output of jhove. Now, I don't know how difficult this will
> be, and if you are working on it, and it's going to be coming out in 6
> months, then perhaps my time would be better spent doing something else.
>
> So, what do you think about this.  Do you see this sort of functionality
> realistically being available in the near future ( version 1.5, perhaps? ),
> or is what I want to do not that difficult, and only a small subset of what
> you're working on, so why not just do it?

The work I'm doing might help somewhat, since it will include more
sophisticated and accurate format identification, plus a measurement of
confidence in the identification -- but I can't make any promises
whether it will get into 1.5.  Full details should be available on the
wiki within a few weeks, I'll anounce it on the dspace-tech and
dspace-devel lists so the community can comment on my proposal.  It's a
whole framework for integrating external data format registries (like
the GDFR), as well as format-identifying applications.  It does not
include format validators but they do have a place in the overall
design.

I'm not sure JHOVE version 1 will be much help, either -- I recommend
taking a hard look at its limitations before spending any time on it.
Its repertoire of formats is somewhat limited, and the output is not
trivial to interpret.  Also, we found it gave a significant number of
false negatives when validating.

What's left?  Tools like DROID (droid.sourceforge.net), perhaps, although
it has some difficulty integrating with DSpace.  It just identifies
formats, but you could use that as a quasi-validation.

Also, the field of data format representation, identification, and
validation is in great flux right now, so there will be improvements.
Thats's why I'm designing a very flexible framework to let DSpace make
use of external resources.

We heard the JHOVE 2 project just got funded, so that will be worth
watching.  See http://fileformats.blogspot.com/search/label/JHOVE
for some early hints.

    -- Larry


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to