Patches item #2234659, was opened at 2008-11-07 17:06 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=319984&aid=2234659&group_id=19984
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Serhij Dubyk (dubyk) Assigned to: Nobody/Anonymous (nobody) Summary: Add support for DjVu-documents Initial Comment: Hello All This patch based on http://mailman.mit.edu/pipermail/dspace-general/2007-May/001513.html In DSpace 1.5.0+ we need (before compilation) 1) Add utility djvutxt (package djvulibre), for Debian it is: apt-get install djvulibre-bin 2) Edit [dspace-source]/dspace/config/dspace.cfg, text-block "### Media Filter / Format Filter plugins" and add DjVu-support in 3 places: filter.plugins = ... \ DjVu Text Extractor plugin.named.org.dspace.app.mediafilter.FormatFilter = ... \ org.dspace.app.mediafilter.DjVuFilter = DjVu Text Extractor filter.org.dspace.app.mediafilter.DjVuFilter.inputFormats = DjVu 3) Edit [dspace-source]/dspace/config/registries/bitstream-formats.xml and add next <bitstream-type> <mimetype>image/vnd.djvu</mimetype> <short_description>DjVu</short_description> <description>DjVu</description> <support_level>1</support_level> <internal>false</internal> <extension>djvu</extension> <extension>djv</extension> </bitstream-type> 4) Create file [dspace-source]/dspace-api/src/main/java/org/dspace/app/mediafilter/DjVuFilter.java with next content /* DjVuFilter.java Version: 0.1 DSpace version: 1.4.2 beta Author: Ivan Penev e-mail: inpenev at gmail.com */ package org.dspace.app.mediafilter; import java.io.InputStream; import java.io.FileInputStream; import java.io.BufferedInputStream; import java.io.ByteArrayInputStream; import java.io.OutputStream; import java.io.FileOutputStream; import java.io.BufferedOutputStream; import java.io.FileReader; import java.io.BufferedReader; import java.io.File; /** * This class provides a media filter for processing files of type DjVu. * <p>The current implementation uses a program called <code>djvutxt</code>, which extracts the text layer from a previously OCR-ed DjVu file and saves it into a UTF-8 text document. The program is distributed with the <code>djvulibre</code> package which is freely available under the GPL license from <a href="http://djvu.sourceforge.net/">http://djvu.sourceforge.net/</a> for both Unix and Windows operating systems. Hence, for the media filter to work it is required that <code>djvutxt</code> is a valid command (in the working environment).</p> */ public class DjVuFilter extends MediaFilter { /** * Get a filename for a newly created filtered bitstream. * * @param sourceName * name of source bitstream * @return filename generated by the filter - for example, document.djvu * becomes document.djvu.txt */ public String getFilteredName(String sourceName) { return sourceName + ".txt"; } /** * Get name of the bundle this filter will stick its generated bitstreams. * * @return "TEXT" */ public String getBundleName() { return "TEXT"; } /** * Get name of the bitstream format returned by this filter. * * @return "Text" */ public String getFormatString() { return "Text"; } /** * Get a string describing the newly-generated bitstream. * * @return "Extracted text" */ public String getDescription() { return "Extracted text"; } /** * Get a bitstream filled with the extracted text from a DjVu bitstream. * <p>The bitstream supplied as a parameter is written to a DjVu file on the file system (in the working directory), and the system command <code>djvutxt</code> is called on the latter to produce a UTF-8 text file containg the extracted text. The file is then copied to a bitstream. Finally, the auxiliary files are removed from the file system, and the generated bitsream is returned as a result.</p> * <p>WARNING! Write access to the working directory is needed for this method to operate! No exception handling provided!</p> * * @param source * input stream * * @return result of filter's transformation, written out to a bitstream */ public InputStream getDestinationStream(InputStream source) throws Exception { /* Some convenience initializations. */ final String cmd = "djvutxt"; final String fileName = "aux"; final String djvuFileName = fileName + ".djvu"; final String txtFileName = fileName + ".txt"; /* Store input bitstresam to auxiliary DjVu file. */ File djvuFile = streamToFile(source, djvuFileName); /* Invoke external command djvutxt with appropriate agruments to do the actual job... */ final String[] cmdArray = {cmd, djvuFileName, txtFileName}; Process p = Runtime.getRuntime().exec(cmdArray); /* ...and wait for it to terminate */ p.waitFor(); /* Copy extracted text from file to an independent bitstream, and optionally print the text to standard output. */ File txtFile = new File(txtFileName); InputStream dest = fileToStream(txtFile, MediaFilterManager.isVerbose); /* Then remove auxiliary files...*/ djvuFile.delete(); txtFile.delete(); /* ...and return resulting bitstream. */ return dest; } /** * Write given input stream to a file on the file system. * <p>WARNING! No exception handling!</p> * * @param inStream input stream * @param fileName name of the file to be generated * * @return <code>File</code> object associated with the generated file * * @throws Exception */ private File streamToFile(InputStream inStream, String fileName) throws Exception { /* Data will be read from input stream in chunks of size e.g. 4KB. */ final int chunkSize = 4096; byte[] byteArray = new byte[chunkSize]; /* Open the stream for buffered reading. */ InputStream bufInStream = new BufferedInputStream(inStream); /* Create an empty file (if the file already exists, it will be left untouched) to store the supplied bitstream... */ File file = new File(fileName); file.createNewFile(); /* ...and associate a buffered output stream with it. */ OutputStream bufOutStream = new BufferedOutputStream(new FileOutputStream(file)); /* Copy data from input stream to newly generated file. */ int readBytes = -1; while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) != -1) bufOutStream.write(byteArray, 0, readBytes); /* Stop transactions to the file system... */ bufOutStream.close(); /* ...and return result. */ return file; } /** * Produce input stream from a given file on the file system. * <p>WARNING! No exception handling!</p> * * @param file <code>File</code> object associated with the given file * * @return input stream containing the data read from file * [EMAIL PROTECTED] Exception */ private InputStream fileToStream(File file, boolean verbose) throws Exception { /* Open the stream for reading. */ InputStream inStream = new FileInputStream(file); /* Allocate necessary memory for data buffer. */ byte[] byteArray = new byte[(int)file.length()]; /* Load file contents into buffer. */ inStream.read(byteArray); /* And imediately close transactions with the file system. */ inStream.close(); /* If required to send the retrieved data to standard output... */ if (verbose) { /* Open the file again, but this tim handle it as a character stream... */ BufferedReader bufReader = new BufferedReader(new FileReader(file)); /* ...then print its contents line by line to the standard output... */ String lineOfText = null; while ((lineOfText = bufReader.readLine()) != null) System.out.println(lineOfText); /* ...and close connection to the file. */ bufReader.close(); } /* Finally, generate and return input stream containing desired data. */ return new ByteArrayInputStream(byteArray); } } 5) Compilation/recompilation cd [dspace-source]/dspace/dspace-1.5.0-src-release/dspace/ mvn package 6) Install or for recompilation - {edit work bitstream-formats.xml & dspace.cfg as above and replace dspace-api-1.5.0.jar from folders webapps/jspui/WEB-INF/lib/, lib/, webapps/lni/WEB-INF/lib/, webapps/oai/WEB-INF/lib/, webapps/xmlui/WEB-INF/lib/ by compiled [dspace-source]/dspace-api/target/dspace-api-1.5.0.jar} 7) Don't forgive restart Tomcat and run /usr/share/dspace/bin/filter-media With best regards Serhij Dubyk ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=319984&aid=2234659&group_id=19984 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dspace-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-devel
