[ 
https://jira.duraspace.org/browse/DS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=19256#action_19256
 ] 

Richard Rodgers commented on DS-638:
------------------------------------

First to Mark's observations: that is exactly one of the goals of curation 
framework - to factor out common code that all ingest pathways (or as you say 
channels) use. I think you are exactly right that there has to be an 
architectural place after submission but before installation where some of this 
work is performed. I recruited DSpace workflow to play that part. This does, as 
Tim observes, require non-trivial reconfiguration for existing repositories, 
but in the end I think the flexibility & configurability will compensate for 
that. For example, we can leverage all the workflow notification machinery to 
tell us who is submitting infected files - much harder to build into submission.
Next to Tim's:
Please don't take my remarks as opposing integration - we will likely need 
both, as you say. I was trying to understand what the real requirements were in 
this case.
 As to format identification, I think that is the poster-child for a sensible 
refactor as a curation task, and likely doesn't belong in submission 
ultimately. Right now, to use just 3 examples:
(1) ItemImport - uses 'guessFormat' (file extension)
(2) SWORD - uses a mime-type that is passed in with the deposit (not verified 
at all against the file)
(3) Submission: guessFormat followed by the ability for the submitter to 
completely override with an arbitrary, unchecked format type assignment
The result is that we have no confidence that files are properly 
format-identified, or even *uniformly* identified
So do we go to the 5 modules and add calls to a DROID task, or do it once in 
workflow, consistently for all ingests?

> check files on input for viruses, and verify file format 
> ---------------------------------------------------------
>
>                 Key: DS-638
>                 URL: https://jira.duraspace.org/browse/DS-638
>             Project: DSpace
>          Issue Type: New Feature
>          Components: JSPUI
>    Affects Versions: 1.6.2
>         Environment: to use this patch you will need to have ClamAV, and 
> jhove installed on your system.
>            Reporter: Jose Blanco
>            Assignee: Robin Taylor
>         Attachments: java_files.zip, jhove_config_files.zip, jsp_files.zip
>
>
> This patch uses JHOVE to provide rough-and-ready format checking by 
> identifying that the file/bitstream extension matches  formats verifiable by 
> JHOVE. (Currently DSpace accepts a deposit's file extension as gospel, so a 
> user could tack a ".txt" extension onto a GIF and DSpace would assign the 
> incorrect format to the file based on that incorrect extension.) 
> This patch also also contains code to check the file for the presence of 
> viruses.
> In order to use this patch you must have jhove and ClamAV installed on your 
> system. 
> Important notes:
> (1) HTML identification has proved unreliable ( by jhove ), so this patch 
> does not return accurate results for that 
> file format.
> (2) This code does not fully incorporate JHOVE's validation functions; it 
> only verifies that what depositors intended to submit is in fact what they 
> submitted.
> The following are returned messages when an error is detected:
> Text in [brackets] is a returned value, ALLCAPS can/should be modified to 
> reflect your current installation.
> Questionable AIFF, GIF, JPG, PDF, TIF, WAVE, XML:
> DSPACE could not verify that your file is a valid [file_format_extension]. 
> Please check the file format and ".[file_format_extension]" extension.
> Questionable TXT:
> DSPACE found the text file you are trying to upload is neither UTF-8 nor 
> ASCII. Please verify that your file is in the format you wanted.
> Spaces in filenames ( this is an additional check ):
> The file name contains spaces; this is not recommended. If possible, please 
> replace spaces with underscores: "_".
> Virus detected:
> DSPACE detected a virus in this file. Please repair it and resume the 
> deposit. If you need assistance, please contact us: EMAIL_ADDRESS.
> To get the patch working:
> Add the jhove conf files to
> [dspace]/jhove direcoty
> Here are the conf files:
> jhove-aiff.conf
> jhove-ascii.conf
> jhove-gif.conf
> jhove-jpeg.conff
> jhove-pdf.conf
> jhove-tiff.conf
> jhove-utf8.conf
> jhove-wave.conf
> jhove-xml.conf
> Also the following files were changed:
> dspace-api/src/main/java/org/dspace/submit/step/UploadStep.java
> dspace-jspui/dspace-jspui-api/src/main/java/org/dspace/app/webui/submit/step/JSPUploadStep.java
> dspace-api/src/main/java/org/dspace/content/FormatIdentifier.java
> dspace/modules/jspui/src/main/webapp/submit/get-file-format.jsp ( locally 
> customized )
> dspace/modules/jspui/src/main/webapp/submit/upload-error-virus.jsp ( new file 
> - placed in locally modified area for the jspui interface)
> These files are attached with this patch.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://jira.duraspace.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to