[
http://jira.dspace.org/jira/browse/DS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Kiplagat resolved DS-49.
--------------------------------
Resolution: Won't Fix
> Add support for DjVu-documents - ID: 2234659
> --------------------------------------------
>
> Key: DS-49
> URL: http://jira.dspace.org/jira/browse/DS-49
> Project: DSpace 1.x
> Issue Type: Improvement
> Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Reporter: Charles Kiplagat
>
> Hello All
> This patch based on
> http://mailman.mit.edu/pipermail/dspace-general/2007-May/001513.html
> In DSpace 1.5.0+ we need (before compilation)
> 1) Add utility djvutxt (package djvulibre), for Debian it is:
> apt-get install djvulibre-bin
> 2) Edit [dspace-source]/dspace/config/dspace.cfg, text-block "### Media
> Filter / Format Filter plugins"
> and add DjVu-support in 3 places:
> filter.plugins = ... \
> DjVu Text Extractor
> plugin.named.org.dspace.app.mediafilter.FormatFilter = ... \
> org.dspace.app.mediafilter.DjVuFilter = DjVu Text Extractor
> filter.org.dspace.app.mediafilter.DjVuFilter.inputFormats = DjVu
> 3) Edit [dspace-source]/dspace/config/registries/bitstream-formats.xml
> and add next
> <bitstream-type>
> <mimetype>image/vnd.djvu</mimetype>
> <short_description>DjVu</short_description>
> <description>DjVu</description>
> <support_level>1</support_level>
> <internal>false</internal>
> <extension>djvu</extension>
> <extension>djv</extension>
> </bitstream-type>
> 4) Create file
> [dspace-source]/dspace-api/src/main/java/org/dspace/app/mediafilter/DjVuFil
> ter.java
> with next content
> /*
> DjVuFilter.java
> Version: 0.1
> DSpace version: 1.4.2 beta
> Author: Ivan Penev
> e-mail: inpenev at gmail.com
> */
> package org.dspace.app.mediafilter;
> import java.io.InputStream;
> import java.io.FileInputStream;
> import java.io.BufferedInputStream;
> import java.io.ByteArrayInputStream;
> import java.io.OutputStream;
> import java.io.FileOutputStream;
> import java.io.BufferedOutputStream;
> import java.io.FileReader;
> import java.io.BufferedReader;
> import java.io.File;
> /**
> * This class provides a media filter for processing files of type DjVu.
> * <p>The current implementation uses a program called
> <code>djvutxt</code>, which extracts the text layer from a previously
> OCR-ed DjVu file and saves it into a UTF-8 text document. The program
> is distributed with the <code>djvulibre</code> package which is freely
> available under the GPL license from <a
> href="http://djvu.sourceforge.net/">http://djvu.sourceforge.net/</a>
> for both Unix and Windows operating systems. Hence, for the media
> filter to work it is required that <code>djvutxt</code> is a valid
> command (in the working environment).</p>
> */
> public class DjVuFilter extends MediaFilter
> {
> /**
> * Get a filename for a newly created filtered bitstream.
> *
> * @param sourceName
> * name of source bitstream
> * @return filename generated by the filter - for example, document.djvu
> * becomes document.djvu.txt
> */
> public String getFilteredName(String sourceName)
> {
> return sourceName + ".txt";
> }
> /**
> * Get name of the bundle this filter will stick its generated
> bitstreams.
> *
> * @return "TEXT"
> */
> public String getBundleName()
> {
> return "TEXT";
> }
> /**
> * Get name of the bitstream format returned by this filter.
> *
> * @return "Text"
> */
> public String getFormatString()
> {
> return "Text";
> }
> /**
> * Get a string describing the newly-generated bitstream.
> *
> * @return "Extracted text"
> */
> public String getDescription()
> {
> return "Extracted text";
> }
> /**
> * Get a bitstream filled with the extracted text from a DjVu bitstream.
> * <p>The bitstream supplied as a parameter is written to a DjVu
> file on the file system (in the working directory), and the system
> command <code>djvutxt</code> is called on the latter to produce a
> UTF-8 text file containg the extracted text. The file is then copied
> to a bitstream. Finally, the auxiliary files are removed from the file
> system, and the generated bitsream is returned as a result.</p>
> * <p>WARNING! Write access to the working directory is needed for
> this method to operate! No exception handling provided!</p>
> *
> * @param source
> * input stream
> *
> * @return result of filter's transformation, written out to a bitstream
> */
> public InputStream getDestinationStream(InputStream source) throws
> Exception
> {
> /* Some convenience initializations. */
> final String cmd = "djvutxt";
> final String fileName = "aux";
> final String djvuFileName = fileName + ".djvu";
> final String txtFileName = fileName + ".txt";
> /* Store input bitstresam to auxiliary DjVu file. */
> File djvuFile = streamToFile(source, djvuFileName);
> /* Invoke external command djvutxt with appropriate agruments
> to do the actual job... */
> final String[] cmdArray = {cmd, djvuFileName, txtFileName};
> Process p = Runtime.getRuntime().exec(cmdArray);
> /* ...and wait for it to terminate */
> p.waitFor();
> /* Copy extracted text from file to an independent bitstream,
> and optionally print the text to standard output. */
> File txtFile = new File(txtFileName);
> InputStream dest = fileToStream(txtFile, MediaFilterManager.isVerbose);
> /* Then remove auxiliary files...*/
> djvuFile.delete();
> txtFile.delete();
> /* ...and return resulting bitstream. */
> return dest;
> }
> /**
> * Write given input stream to a file on the file system.
> * <p>WARNING! No exception handling!</p>
> *
> * @param inStream input stream
> * @param fileName name of the file to be generated
> *
> * @return <code>File</code> object associated with the generated file
> *
> * @throws Exception
> */
> private File streamToFile(InputStream inStream, String fileName)
> throws Exception
> {
> /* Data will be read from input stream in chunks of size e.g. 4KB. */
> final int chunkSize = 4096;
> byte[] byteArray = new byte[chunkSize];
> /* Open the stream for buffered reading. */
> InputStream bufInStream = new BufferedInputStream(inStream);
> /* Create an empty file (if the file already exists, it will be left
> untouched)
> to store the supplied bitstream... */
> File file = new File(fileName);
> file.createNewFile();
> /* ...and associate a buffered output stream with it. */
> OutputStream bufOutStream = new BufferedOutputStream(new
> FileOutputStream(file));
> /* Copy data from input stream to newly generated file. */
> int readBytes = -1;
> while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) != -1)
> bufOutStream.write(byteArray, 0, readBytes);
> /* Stop transactions to the file system... */
> bufOutStream.close();
> /* ...and return result. */
> return file;
> }
> /**
> * Produce input stream from a given file on the file system.
> * <p>WARNING! No exception handling!</p>
> *
> * @param file <code>File</code> object associated with the given file
> *
> * @return input stream containing the data read from file
> *
> *...@throws Exception
> */
> private InputStream fileToStream(File file, boolean verbose) throws
> Exception
> {
> /* Open the stream for reading. */
> InputStream inStream = new FileInputStream(file);
> /* Allocate necessary memory for data buffer. */
> byte[] byteArray = new byte[(int)file.length()];
> /* Load file contents into buffer. */
> inStream.read(byteArray);
> /* And imediately close transactions with the file system. */
> inStream.close();
> /* If required to send the retrieved data to standard output... */
> if (verbose)
> {
> /* Open the file again, but this tim handle it as a character stream...
> */
> BufferedReader bufReader = new BufferedReader(new FileReader(file));
> /* ...then print its contents line by line to the standard output... */
> String lineOfText = null;
> while ((lineOfText = bufReader.readLine()) != null)
> System.out.println(lineOfText);
> /* ...and close connection to the file. */
> bufReader.close();
> }
> /* Finally, generate and return input stream containing desired data. */
> return new ByteArrayInputStream(byteArray);
> }
> }
> 5) Compilation/recompilation
> cd [dspace-source]/dspace/dspace-1.5.0-src-release/dspace/
> mvn package
> 6) Install or for recompilation - {edit work bitstream-formats.xml &
> dspace.cfg as above and replace dspace-api-1.5.0.jar from folders
> webapps/jspui/WEB-INF/lib/, lib/, webapps/lni/WEB-INF/lib/,
> webapps/oai/WEB-INF/lib/, webapps/xmlui/WEB-INF/lib/ by compiled
> [dspace-source]/dspace-api/target/dspace-api-1.5.0.jar}
> 7) Don't forgive restart Tomcat and run
> /usr/share/dspace/bin/filter-media
> With best regards
> Serhij Dubyk
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel