On Thu, Apr 12, 2012 at 12:17, Shanu Jha <[email protected]> wrote: > There would be two approaches to check duplicates items in dspace at time of > submission > 1. Item Metadata - create a checksum using metadata > 2. Bitstream - create a checksum using bitstream > Both approaches not lead us to a stable solution. In bitstream same > document can be appear in may different MIME types (ie. PDF, DOC etc) and > different MIME types will have different checksum.
1. This may require quite a bit of programming to do correctly, but it would certainly be a valuable addition to DSpace. You shouldn't think of checksumming metadata, but to fuzzy-match it and possibly display multiple matches. 2. I'm afraid there's no viable way to tell that e.g. a .doc file matches a .pdf file. You could look at the internal metadata of these files, but again, there are PDF printers which do not transfer these. It may be possible in a controlled environment, e.g. if you prescribed a specific PDF printer for your submitters. Regards, ~~helix84 ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

