Message: 2 Date: Fri, 16 Nov 2007 4:08:01 EST From: Larry Stone <[EMAIL PROTECTED]> Subject: Re: [Dspace-tech] Viruses and DSpace To: "Blanco, Jose" <[EMAIL PROTECTED]> Cc: [email protected] Message-ID: <[EMAIL PROTECTED]>
> Has any thought been given to how Dspace might handle the remote ( > hopefully ) possibility of a file containing a virus being deposited > into a repository? It seems like jhove might be the kind of tool that > could check for this. I believe there is some work going on to > incorporate jhove into Dspace, how is that coming along? It's not part > of of 1.5, but what about for the following release? The BitstreamFormat renovation (see http://wiki.dspace.org/index.php/BitstreamFormat_Renovation ) doesn't address this directly, but will make it much easier to integrate tools because file formats will be identified more effectively and precisely. Once the format is known you can add a mechanism like the mediafilters, perhaps integrated with workflow, to run specific checks depending on the format type. JHOVE version 1 is just a format validator and technical-metadata extractor, it isn't subtle enough to look for viruses. There _are_ tools in the email filtering domain which detect malicious MS Office files; I've heard of them but don't remember specifics. You could start by looking around the SpamAssassin software and ClamAV (see http://www.clamav.net/ ) However, be aware that any virus-checking software needs constant updating since you're essentially in an arms race. -- Larry (a recovering postmaster) I realize this may not be particularly relevant to your question, but we handle virus checking as part of the ingest process. ClamAV has Python bindings that make it very simple to integrate into our Python-based batch ingest processes. We also run ClamAV against the Ubuntu-based DSpace server and Solaris-based storage array containing the asset stores. Providing feedback to data contributors about malformed, corrupt, incomplete, and infected data sets is a very important part of our pre-ingest workflow. We use Jhove for file validation and look forward to more geospatial format support. Jhove is excellent at what it does, but I wouldn't anticipate that it will support virus scanning in the future. In addition, we use the Unix file utility and magic numbers to look for executables, since that would be pretty suspicious for geospatial data. Jim -- ------------------------------- Jim Tuttle Geospatial Data Librarian NCSU Libraries, Box 7111 North Carolina State University Raleigh, NC 27695-7111 jim_tuttle at ncsu.edu (919)513-0651 Phone (919)515-3031 Fax ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

