Hello,
any experience and effort are welcome!
The DSpace community exchange opinions, comments and experiences on many 
different channels (mailing lists, bug&path code on sourceforge.net)
Your contribute is "a patch", a new feature.
You can submit it at: 
http://sourceforge.net/tracker/?group_id=19984&atid=319984
If you has any problem please let me know, and I can do it for you.

I have forwarded this response also at dspace-tech mailing list that is 
the more appropriate for this topic.
Thank you again for your contribute and welcome to the DSpace community.
Best wishes,
Andrea

Иван Пенев ha scritto:
> On Tue Jul 11 04:41:23 EDT 2006 Jama Poulsen wrote:
>   
>>  Something else. Has anyone worked with DjVu files and DSpace?
>>
>>  Some DjVu links:
>>  - http://en.wikipedia.org/wiki/DjVu
>>  - http://djvulibre.djvuzone.org
>>  - http://www.djvuzone.org/links/ (example archives)
>>  - http://www.djvuzone.org
>>  - http://any2djvu.djvuzone.org/
>>  - http://www.archive.org/details/newrock
>>
>>  If not I'd like to discuss this anyway :-)
>>
>>     
>
>    Dear Jama Poulsen, (and everybody interested in this subject...)
>
>    I have recently started to use the DSpace software.
>    I am neither librarian nor IT specialist, but just a student, and
> for now I would only like to manage my own collection of mathematics
> books (collected from various sites on the Internet), the most of
> which have been scanned from paper and stored in DjVu format.
>    As you know, there is a project on <sourceforge.net>, "djvulibre",
> which provides an open-source implementation of DjVu. The package
> includes a utility, "djvutxt", for extracting the text layer from a
> previously OCR-ed DjVu files. I have just written a MediaFilter class
> that invokes this utility to get the extracted text.     For now, it
> works well, but I haven't done many tests with it yet. Nevertheless, I
> would like to share the code with the members of the DSpace community,
> who will eventually want to improve it. For I have only entry-level
> Java programming skills, so the code is most likely inefficient and/or
> buggy.
>    What I actually did, is to put the following lines in
> [dspace-source]/config/dspace.cfg file:
> plugin.sequence.org.dspace.app.mediafilter.MediaFilter = \
>     org.dspace.app.mediafilter.DjVuFilter, \
> ...
> filter.org.dspace.app.mediafilter.DjVuFilter.inputFormats = DjVu
> as well as to add the following element to
> [dspace-source]/config/registries/bitstream-formats.xml:
>   <bitstream-type>
>         <mimetype>image/vnd.djvu</mimetype>
>         <short_description>DjVu</short_description>
>         <description>DjVu</description>
>         <support_level>1</support_level>
>         <internal>false</internal>
>         <extension>djvu</extension>
>         <extension>djv</extension>
>   </bitstream-type>
> and to put the source code DjVuFilter.java in the
> [dspace-source]src/org/dspace/app/mediafilter directory before running
> "ant fresh_install".
>
> Here is the source code:
> -------------------------------------DjVuFilter.java-------------------------------------
>
> /*
>    DjVuFilter.java
>    Version: 0.1
>    DSpace version: 1.4.2 beta
>    Author: Ivan Penev
>    e-mail: [EMAIL PROTECTED]
> */
>
> package org.dspace.app.mediafilter;
>
> import java.io.InputStream;
> import java.io.FileInputStream;
> import java.io.BufferedInputStream;
> import java.io.ByteArrayInputStream;
> import java.io.OutputStream;
> import java.io.FileOutputStream;
> import java.io.BufferedOutputStream;
> import java.io.FileReader;
> import java.io.BufferedReader;
> import java.io.File;
>
> /**
> * This class provides a media filter for processing files of type DjVu.
> * <p>The current implementation uses a program called
> <code>djvutxt</code>, which extracts the text layer from a previously
> OCR-ed DjVu file and saves it into a UTF-8 text document. The program
> is distributed with the <code>djvulibre</code> package which is freely
> available under the GPL license from <a
> href="http://djvu.sourceforge.net/";>http://djvu.sourceforge.net/</a>
> for both Unix and Windows operating systems. Hence, for the media
> filter to work it is required that <code>djvutxt</code> is a valid
> command (in the working environment).</p>
> */
> public class DjVuFilter extends MediaFilter
> {
>     /**
>     * Get a filename for a newly created filtered bitstream.
>     *
>     * @param sourceName
>     *            name of source bitstream
>     * @return filename generated by the filter - for example, document.djvu
>     *         becomes document.djvu.txt
>     */
>       public String getFilteredName(String sourceName)
>       {
>               return sourceName + ".txt";
>       }
>               
>     /**
>     * Get name of the bundle this filter will stick its generated bitstreams.
>     *
>     * @return "TEXT"
>     */
>       public String getBundleName()
>       {
>               return "TEXT";
>       }
>       
>     /**
>     * Get name of the bitstream format returned by this filter.
>     *
>     * @return "Text"
>     */        
>       public String getFormatString()
>       {
>               return "Text";
>       }
>               
>     /**
>     * Get a string describing the newly-generated bitstream.
>     *
>     * @return  "Extracted text"
>     */        
>       public String getDescription()
>       {
>               return "Extracted text";
>       }
>
>     /**
>     * Get a bitstream filled with the extracted text from a DjVu bitstream.
>     * <p>The bitstream supplied as a parameter is written to a DjVu
> file on the file system (in the working directory), and the system
> command <code>djvutxt</code> is called on the latter to produce a
> UTF-8 text file containg the extracted text. The file is then copied
> to a bitstream. Finally, the auxiliary files are removed from the file
> system, and the generated bitsream is returned as a result.</p>
>     * <p>WARNING! Write access to the working directory is needed for
> this method to operate! No exception handling provided!</p>
>     *
>     * @param source
>     *            input stream
>     *
>     * @return result of filter's transformation, written out to a bitstream
>     */
>       public InputStream getDestinationStream(InputStream source) throws 
> Exception
>       {
>               /* Some convenience initializations. */
>               final String cmd = "djvutxt";
>               final String fileName = "aux";
>               final String djvuFileName = fileName + ".djvu";
>               final String txtFileName = fileName + ".txt";
>               
>               /* Store input bitstresam to auxiliary DjVu file. */
>               File djvuFile = streamToFile(source, djvuFileName);
>               
>               /* Invoke external command djvutxt with appropriate agruments
>               to do the actual job... */
>               final String[] cmdArray = {cmd, djvuFileName, txtFileName};
>               Process p = Runtime.getRuntime().exec(cmdArray);
>               /* ...and wait for it to terminate */
>               p.waitFor();
>               
>               /* Copy extracted text from file to an independent bitstream,
>                and optionally print the text to standard output. */
>               File txtFile = new File(txtFileName);
>               InputStream dest = fileToStream(txtFile, 
> MediaFilterManager.isVerbose);
>               
>               /* Then remove auxiliary files...*/
>               djvuFile.delete();
>               txtFile.delete();
>               /* ...and return resulting bitstream. */
>               return dest;
>       }
>       
>     /**
>     * Write given input stream to a file on the file system.
>     * <p>WARNING! No exception handling!</p>
>     *
>     * @param inStream input stream
>     * @param fileName name of the file to be generated
>     *
>     * @return <code>File</code> object associated with the generated file
>     *
>     * @throws Exception
>     */
>       private File streamToFile(InputStream inStream, String fileName)
> throws Exception
>       {
>               /*  Data will be read from input stream in chunks of size e.g. 
> 4KB. */
>               final int chunkSize = 4096;
>               byte[] byteArray = new byte[chunkSize];
>               
>               /* Open the stream for buffered reading. */
>               InputStream bufInStream = new BufferedInputStream(inStream);
>               
>               /* Create an empty file (if the file already exists, it will be 
> left
> untouched)
>                to store the supplied bitstream... */
>               File file = new File(fileName);
>               file.createNewFile();
>               /* ...and associate a buffered output stream with it. */
>               OutputStream bufOutStream = new BufferedOutputStream(new
> FileOutputStream(file));
>               
>               /* Copy data from input stream to newly generated file. */
>               int readBytes = -1;
>               while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) 
> != -1)
>                       bufOutStream.write(byteArray, 0, readBytes);
>               
>               /* Stop transactions to the file system... */
>               bufOutStream.close();
>               /* ...and return result. */
>               return file;
>       }
>       
>     /**
>     * Produce input stream from a given file on the file system.
>     * <p>WARNING! No exception handling!</p>
>     *
>     * @param file <code>File</code> object associated with the given file
>     *
>     * @return input stream containing the data read from file
>     *
>     [EMAIL PROTECTED] Exception
>     */
>       private InputStream fileToStream(File file, boolean verbose) throws 
> Exception
>       {
>               /* Open the stream for reading. */
>               InputStream inStream = new FileInputStream(file);
>               
>               /* Allocate necessary memory for data buffer. */
>               byte[] byteArray = new byte[(int)file.length()];
>               
>               /* Load file contents into buffer. */
>               inStream.read(byteArray);
>               
>               /* And imediately close transactions with the file system. */
>               inStream.close();
>               
>               /* If required to send the retrieved data to standard output... 
> */
>               if (verbose)
>               {
>                       /* Open the file again, but this tim handle it as a 
> character stream... */
>                       BufferedReader bufReader = new BufferedReader(new 
> FileReader(file));
>                       /* ...then print its contents line by line to the 
> standard output... */
>                       String lineOfText = null;
>                       while ((lineOfText = bufReader.readLine()) != null)
>                               System.out.println(lineOfText);
>                       /* ...and close connection to the file. */
>                       bufReader.close();
>               }
>                       
>               /* Finally, generate and return input stream containing desired 
> data. */
>               return new ByteArrayInputStream(byteArray);
>       }
> }
>
> --------------------------------End of source
> code------------------------------------
>
> Please, excuse me for my poor English, and superfluous verbosity!
>
> Best wishes!
>
> Ivan Penev
> _______________________________________________
> Dspace-general mailing list
> [EMAIL PROTECTED]
> http://mailman.mit.edu/mailman/listinfo/dspace-general
>
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to