On Thu, Apr 12, 2012 at 12:17, Shanu Jha <[email protected]> wrote:
> There would be two approaches to check duplicates items in dspace at time of
> submission
> 1. Item Metadata -  create a checksum using metadata
> 2. Bitstream - create a checksum using bitstream
> Both approaches not lead us to a stable solution.  In bitstream same
> document can be appear in may different MIME types (ie. PDF, DOC etc) and
> different MIME types will have different checksum.

1. This may require quite a bit of programming to do correctly, but it
would certainly be a valuable addition to DSpace. You shouldn't think
of checksumming metadata, but to fuzzy-match it and possibly display
multiple matches.
2. I'm afraid there's no viable way to tell that e.g. a .doc file
matches a .pdf file. You could look at the internal metadata of these
files, but again, there are PDF printers which do not transfer these.
It may be possible in a controlled environment, e.g. if you prescribed
a specific PDF printer for your submitters.

Regards,
~~helix84

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to