Hello all In our repository http://elartu.tntu.edu.ua are often used djvu-documents with built-in text layer. For full-text search is used DjVuFilter (look links below). ([dspace-source]/dspace-api/src/main/java/org/dspace/app/mediafilter/DjVuFilter.java) We are now updated from 1.5.2 to 1.8.1.
In version 1.5.2 file DjVuFilter.java compiled well, but in 1.8.1 does not compile anymore. I am not a programmer in Java. But code is not long. Maybe need to change the modules that are imported (import .. ;) or something else. Compilation (mvn-X-U clean package) does not issue any error on this file. I would appreciate any ideas to restore DjVu in DSpace 1.8.1. Links: 1) Add support for DjVu-documents (https://jira.duraspace.org/browse/DS-49) 2) Add support for DjVu-documents (http://dspace.2283337.n4.nabble.com/dspace-Patches-2234659-Add-support-for-DjVu-documents-td3291132.html) 3) DjVu file format in DSpace (http://mailman.mit.edu/pipermail/dspace-general/2007-May/001513.html) With best regards Serhij Dubyk Ukraine, TNTU P.S. ===========================DjVuFilte.java=============================== /* DjVuFilter.java Version: 0.1 DSpace version: 1.4.2 beta Author: Ivan Penev e-mail: inpenev at gmail.com */ package org.dspace.app.mediafilter; import java.io.InputStream; import java.io.FileInputStream; import java.io.BufferedInputStream; import java.io.ByteArrayInputStream; import java.io.OutputStream; import java.io.FileOutputStream; import java.io.BufferedOutputStream; import java.io.FileReader; import java.io.BufferedReader; import java.io.File; /** * This class provides a media filter for processing files of type DjVu. * <p>The current implementation uses a program called <code>djvutxt</code>, which extracts the text layer from a previously OCR-ed DjVu file and saves it into a UTF-8 text document. The program is distributed with the <code>djvulibre</code> package which is freely available under the GPL license from <a href="http://djvu.sourceforge.net/">http://djvu.sourceforge.net/</a> for both Unix and Windows operating systems. Hence, for the media filter to work it is required that <code>djvutxt</code> is a valid command (in the working environment).</p> */ public class DjVuFilter extends MediaFilter { /** * Get a filename for a newly created filtered bitstream. * * @param sourceName * name of source bitstream * @return filename generated by the filter - for example, document.djvu * becomes document.djvu.txt */ public String getFilteredName(String sourceName) { return sourceName + ".txt"; } /** * Get name of the bundle this filter will stick its generated bitstreams. * * @return "TEXT" */ public String getBundleName() { return "TEXT"; } /** * Get name of the bitstream format returned by this filter. * * @return "Text" */ public String getFormatString() { return "Text"; } /** * Get a string describing the newly-generated bitstream. * * @return "Extracted text" */ public String getDescription() { return "Extracted text"; } /** * Get a bitstream filled with the extracted text from a DjVu bitstream. * <p>The bitstream supplied as a parameter is written to a DjVu file on the file system (in the working directory), and the system command <code>djvutxt</code> is called on the latter to produce a UTF-8 text file containg the extracted text. The file is then copied to a bitstream. Finally, the auxiliary files are removed from the file system, and the generated bitsream is returned as a result.</p> * <p>WARNING! Write access to the working directory is needed for this method to operate! No exception handling provided!</p> * * @param source * input stream * * @return result of filter's transformation, written out to a bitstream */ public InputStream getDestinationStream(InputStream source) throws Exception { /* Some convenience initializations. */ final String cmd = "djvutxt"; final String fileName = "aux"; final String djvuFileName = fileName + ".djvu"; final String txtFileName = fileName + ".txt"; /* Store input bitstresam to auxiliary DjVu file. */ File djvuFile = streamToFile(source, djvuFileName); /* Invoke external command djvutxt with appropriate agruments to do the actual job... */ final String[] cmdArray = {cmd, djvuFileName, txtFileName}; Process p = Runtime.getRuntime().exec(cmdArray); /* ...and wait for it to terminate */ p.waitFor(); /* Copy extracted text from file to an independent bitstream, and optionally print the text to standard output. */ File txtFile = new File(txtFileName); InputStream dest = fileToStream(txtFile, MediaFilterManager.isVerbose); /* Then remove auxiliary files...*/ djvuFile.delete(); txtFile.delete(); /* ...and return resulting bitstream. */ return dest; } /** * Write given input stream to a file on the file system. * <p>WARNING! No exception handling!</p> * * @param inStream input stream * @param fileName name of the file to be generated * * @return <code>File</code> object associated with the generated file * * @throws Exception */ private File streamToFile(InputStream inStream, String fileName) throws Exception { /* Data will be read from input stream in chunks of size e.g. 4KB. */ final int chunkSize = 4096; byte[] byteArray = new byte[chunkSize]; /* Open the stream for buffered reading. */ InputStream bufInStream = new BufferedInputStream(inStream); /* Create an empty file (if the file already exists, it will be left untouched) to store the supplied bitstream... */ File file = new File(fileName); file.createNewFile(); /* ...and associate a buffered output stream with it. */ OutputStream bufOutStream = new BufferedOutputStream(new FileOutputStream(file)); /* Copy data from input stream to newly generated file. */ int readBytes = -1; while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) != -1) bufOutStream.write(byteArray, 0, readBytes); /* Stop transactions to the file system... */ bufOutStream.close(); /* ...and return result. */ return file; } /** * Produce input stream from a given file on the file system. * <p>WARNING! No exception handling!</p> * * @param file <code>File</code> object associated with the given file * * @return input stream containing the data read from file * *@throws Exception */ private InputStream fileToStream(File file, boolean verbose) throws Exception { /* Open the stream for reading. */ InputStream inStream = new FileInputStream(file); /* Allocate necessary memory for data buffer. */ byte[] byteArray = new byte[(int)file.length()]; /* Load file contents into buffer. */ inStream.read(byteArray); /* And imediately close transactions with the file system. */ inStream.close(); /* If required to send the retrieved data to standard output... */ if (verbose) { /* Open the file again, but this tim handle it as a character stream... */ BufferedReader bufReader = new BufferedReader(new FileReader(file)); /* ...then print its contents line by line to the standard output... */ String lineOfText = null; while ((lineOfText = bufReader.readLine()) != null) System.out.println(lineOfText); /* ...and close connection to the file. */ bufReader.close(); } /* Finally, generate and return input stream containing desired data. */ return new ByteArrayInputStream(byteArray); } } ======================================================================== -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Dspace-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-general
