Hello all

In our repository http://elartu.tntu.edu.ua are often used djvu-documents
with built-in text layer.
For full-text search is used DjVuFilter (look links below).
([dspace-source]/dspace-api/src/main/java/org/dspace/app/mediafilter/DjVuFilter.java)
We are now updated from 1.5.2 to 1.8.1.

In version 1.5.2 file DjVuFilter.java compiled well, but in 1.8.1 does not
compile anymore.

I am not a programmer in Java. But code is not long.  Maybe need to change
the modules that are imported (import .. ;) or something else.
Compilation (mvn-X-U clean package) does not issue any error on this file.

I would appreciate any ideas to restore DjVu in DSpace 1.8.1.

Links:
1) Add support for DjVu-documents (https://jira.duraspace.org/browse/DS-49)
2) Add support for DjVu-documents
(http://dspace.2283337.n4.nabble.com/dspace-Patches-2234659-Add-support-for-DjVu-documents-td3291132.html)
3) DjVu file format in DSpace
(http://mailman.mit.edu/pipermail/dspace-general/2007-May/001513.html)

With best regards
 Serhij Dubyk
  Ukraine, TNTU

P.S.
===========================DjVuFilte.java===============================
/*
DjVuFilter.java
 Version: 0.1
 DSpace version: 1.4.2 beta
 Author: Ivan Penev
 e-mail: inpenev at gmail.com
*/

package org.dspace.app.mediafilter;

import java.io.InputStream;
import java.io.FileInputStream;
import java.io.BufferedInputStream;
import java.io.ByteArrayInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.BufferedOutputStream;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.File;

/**
 * This class provides a media filter for processing files of type DjVu.
 * <p>The current implementation uses a program called
 <code>djvutxt</code>, which extracts the text layer from a previously
 OCR-ed DjVu file and saves it into a UTF-8 text document. The program
 is distributed with the <code>djvulibre</code> package which is freely
 available under the GPL license from <a
 href="http://djvu.sourceforge.net/";>http://djvu.sourceforge.net/</a>
 for both Unix and Windows operating systems. Hence, for the media
 filter to work it is required that <code>djvutxt</code> is a valid
 command (in the working environment).</p>
*/

public class DjVuFilter extends MediaFilter
{
 /**
  * Get a filename for a newly created filtered bitstream.
  *
  * @param sourceName
  * name of source bitstream
  * @return filename generated by the filter - for example, document.djvu
  * becomes document.djvu.txt
 */

 public String getFilteredName(String sourceName)
 {
  return sourceName + ".txt";
 }

 /**
  * Get name of the bundle this filter will stick its generated bitstreams.
  *
  * @return "TEXT"
 */
 public String getBundleName()
 {
  return "TEXT";
 }

 /**
  * Get name of the bitstream format returned by this filter.
  *
  * @return "Text"
 */

 public String getFormatString()
 {
  return "Text";
 }

 /**
  * Get a string describing the newly-generated bitstream.
  *
  * @return "Extracted text"
 */

 public String getDescription()
 {
  return "Extracted text";
 }

 /**
  * Get a bitstream filled with the extracted text from a DjVu bitstream.
  * <p>The bitstream supplied as a parameter is written to a DjVu
  file on the file system (in the working directory), and the system
  command <code>djvutxt</code> is called on the latter to produce a
  UTF-8 text file containg the extracted text. The file is then copied
  to a bitstream. Finally, the auxiliary files are removed from the file
  system, and the generated bitsream is returned as a result.</p>
  * <p>WARNING! Write access to the working directory is needed for
  this method to operate! No exception handling provided!</p>
  *
  * @param source
  * input stream
  *
  * @return result of filter's transformation, written out to a bitstream
 */

 public InputStream getDestinationStream(InputStream source) throws Exception
 {
  /* Some convenience initializations. */
  final String cmd = "djvutxt";
  final String fileName = "aux";
  final String djvuFileName = fileName + ".djvu";
  final String txtFileName = fileName + ".txt";

  /* Store input bitstresam to auxiliary DjVu file. */
  File djvuFile = streamToFile(source, djvuFileName);

  /* Invoke external command djvutxt with appropriate agruments
   to do the actual job... */
  final String[] cmdArray = {cmd, djvuFileName, txtFileName};
  Process p = Runtime.getRuntime().exec(cmdArray);
  /* ...and wait for it to terminate */
  p.waitFor();

  /* Copy extracted text from file to an independent bitstream,
   and optionally print the text to standard output. */
  File txtFile = new File(txtFileName);
  InputStream dest = fileToStream(txtFile, MediaFilterManager.isVerbose);

  /* Then remove auxiliary files...*/
  djvuFile.delete();
  txtFile.delete();
  /* ...and return resulting bitstream. */
  return dest;
 }

 /**
  * Write given input stream to a file on the file system.
  * <p>WARNING! No exception handling!</p>
  *
  * @param inStream input stream
  * @param fileName name of the file to be generated
  *
  * @return <code>File</code> object associated with the generated file
  *
  * @throws Exception
 */

 private File streamToFile(InputStream inStream, String fileName)
 throws Exception
 {
  /* Data will be read from input stream in chunks of size e.g. 4KB. */
  final int chunkSize = 4096;
  byte[] byteArray = new byte[chunkSize];

  /* Open the stream for buffered reading. */
  InputStream bufInStream = new BufferedInputStream(inStream);

  /* Create an empty file (if the file already exists, it will be left
   untouched)
   to store the supplied bitstream... */
  File file = new File(fileName);
  file.createNewFile();
  /* ...and associate a buffered output stream with it. */
  OutputStream bufOutStream = new BufferedOutputStream(new
  FileOutputStream(file));

  /* Copy data from input stream to newly generated file. */
  int readBytes = -1;
  while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) != -1)
  bufOutStream.write(byteArray, 0, readBytes);

  /* Stop transactions to the file system... */
  bufOutStream.close();
  /* ...and return result. */
  return file;
 }

 /**
  * Produce input stream from a given file on the file system.
  * <p>WARNING! No exception handling!</p>
  *
  * @param file <code>File</code> object associated with the given file
  *
  * @return input stream containing the data read from file
  *
  *@throws Exception
 */

 private InputStream fileToStream(File file, boolean verbose) throws
Exception
 {
  /* Open the stream for reading. */
  InputStream inStream = new FileInputStream(file);

  /* Allocate necessary memory for data buffer. */
  byte[] byteArray = new byte[(int)file.length()];

  /* Load file contents into buffer. */
  inStream.read(byteArray);

  /* And imediately close transactions with the file system. */
  inStream.close();

  /* If required to send the retrieved data to standard output... */
  if (verbose)
  {
   /* Open the file again, but this tim handle it as a character stream... */
   BufferedReader bufReader = new BufferedReader(new FileReader(file));
   /* ...then print its contents line by line to the standard output... */
   String lineOfText = null;
   while ((lineOfText = bufReader.readLine()) != null)
   System.out.println(lineOfText);
   /* ...and close connection to the file. */
   bufReader.close();
  }

  /* Finally, generate and return input stream containing desired data. */
  return new ByteArrayInputStream(byteArray);
  }

 }

========================================================================


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to