Hello, any experience and effort are welcome! The DSpace community exchange opinions, comments and experiences on many different channels (mailing lists, bug&path code on sourceforge.net) Your contribute is "a patch", a new feature. You can submit it at: http://sourceforge.net/tracker/?group_id=19984&atid=319984 If you has any problem please let me know, and I can do it for you.
I have forwarded this response also at dspace-tech mailing list that is the more appropriate for this topic. Thank you again for your contribute and welcome to the DSpace community. Best wishes, Andrea Иван Пенев ha scritto: > On Tue Jul 11 04:41:23 EDT 2006 Jama Poulsen wrote: > >> Something else. Has anyone worked with DjVu files and DSpace? >> >> Some DjVu links: >> - http://en.wikipedia.org/wiki/DjVu >> - http://djvulibre.djvuzone.org >> - http://www.djvuzone.org/links/ (example archives) >> - http://www.djvuzone.org >> - http://any2djvu.djvuzone.org/ >> - http://www.archive.org/details/newrock >> >> If not I'd like to discuss this anyway :-) >> >> > > Dear Jama Poulsen, (and everybody interested in this subject...) > > I have recently started to use the DSpace software. > I am neither librarian nor IT specialist, but just a student, and > for now I would only like to manage my own collection of mathematics > books (collected from various sites on the Internet), the most of > which have been scanned from paper and stored in DjVu format. > As you know, there is a project on <sourceforge.net>, "djvulibre", > which provides an open-source implementation of DjVu. The package > includes a utility, "djvutxt", for extracting the text layer from a > previously OCR-ed DjVu files. I have just written a MediaFilter class > that invokes this utility to get the extracted text. For now, it > works well, but I haven't done many tests with it yet. Nevertheless, I > would like to share the code with the members of the DSpace community, > who will eventually want to improve it. For I have only entry-level > Java programming skills, so the code is most likely inefficient and/or > buggy. > What I actually did, is to put the following lines in > [dspace-source]/config/dspace.cfg file: > plugin.sequence.org.dspace.app.mediafilter.MediaFilter = \ > org.dspace.app.mediafilter.DjVuFilter, \ > ... > filter.org.dspace.app.mediafilter.DjVuFilter.inputFormats = DjVu > as well as to add the following element to > [dspace-source]/config/registries/bitstream-formats.xml: > <bitstream-type> > <mimetype>image/vnd.djvu</mimetype> > <short_description>DjVu</short_description> > <description>DjVu</description> > <support_level>1</support_level> > <internal>false</internal> > <extension>djvu</extension> > <extension>djv</extension> > </bitstream-type> > and to put the source code DjVuFilter.java in the > [dspace-source]src/org/dspace/app/mediafilter directory before running > "ant fresh_install". > > Here is the source code: > -------------------------------------DjVuFilter.java------------------------------------- > > /* > DjVuFilter.java > Version: 0.1 > DSpace version: 1.4.2 beta > Author: Ivan Penev > e-mail: [EMAIL PROTECTED] > */ > > package org.dspace.app.mediafilter; > > import java.io.InputStream; > import java.io.FileInputStream; > import java.io.BufferedInputStream; > import java.io.ByteArrayInputStream; > import java.io.OutputStream; > import java.io.FileOutputStream; > import java.io.BufferedOutputStream; > import java.io.FileReader; > import java.io.BufferedReader; > import java.io.File; > > /** > * This class provides a media filter for processing files of type DjVu. > * <p>The current implementation uses a program called > <code>djvutxt</code>, which extracts the text layer from a previously > OCR-ed DjVu file and saves it into a UTF-8 text document. The program > is distributed with the <code>djvulibre</code> package which is freely > available under the GPL license from <a > href="http://djvu.sourceforge.net/">http://djvu.sourceforge.net/</a> > for both Unix and Windows operating systems. Hence, for the media > filter to work it is required that <code>djvutxt</code> is a valid > command (in the working environment).</p> > */ > public class DjVuFilter extends MediaFilter > { > /** > * Get a filename for a newly created filtered bitstream. > * > * @param sourceName > * name of source bitstream > * @return filename generated by the filter - for example, document.djvu > * becomes document.djvu.txt > */ > public String getFilteredName(String sourceName) > { > return sourceName + ".txt"; > } > > /** > * Get name of the bundle this filter will stick its generated bitstreams. > * > * @return "TEXT" > */ > public String getBundleName() > { > return "TEXT"; > } > > /** > * Get name of the bitstream format returned by this filter. > * > * @return "Text" > */ > public String getFormatString() > { > return "Text"; > } > > /** > * Get a string describing the newly-generated bitstream. > * > * @return "Extracted text" > */ > public String getDescription() > { > return "Extracted text"; > } > > /** > * Get a bitstream filled with the extracted text from a DjVu bitstream. > * <p>The bitstream supplied as a parameter is written to a DjVu > file on the file system (in the working directory), and the system > command <code>djvutxt</code> is called on the latter to produce a > UTF-8 text file containg the extracted text. The file is then copied > to a bitstream. Finally, the auxiliary files are removed from the file > system, and the generated bitsream is returned as a result.</p> > * <p>WARNING! Write access to the working directory is needed for > this method to operate! No exception handling provided!</p> > * > * @param source > * input stream > * > * @return result of filter's transformation, written out to a bitstream > */ > public InputStream getDestinationStream(InputStream source) throws > Exception > { > /* Some convenience initializations. */ > final String cmd = "djvutxt"; > final String fileName = "aux"; > final String djvuFileName = fileName + ".djvu"; > final String txtFileName = fileName + ".txt"; > > /* Store input bitstresam to auxiliary DjVu file. */ > File djvuFile = streamToFile(source, djvuFileName); > > /* Invoke external command djvutxt with appropriate agruments > to do the actual job... */ > final String[] cmdArray = {cmd, djvuFileName, txtFileName}; > Process p = Runtime.getRuntime().exec(cmdArray); > /* ...and wait for it to terminate */ > p.waitFor(); > > /* Copy extracted text from file to an independent bitstream, > and optionally print the text to standard output. */ > File txtFile = new File(txtFileName); > InputStream dest = fileToStream(txtFile, > MediaFilterManager.isVerbose); > > /* Then remove auxiliary files...*/ > djvuFile.delete(); > txtFile.delete(); > /* ...and return resulting bitstream. */ > return dest; > } > > /** > * Write given input stream to a file on the file system. > * <p>WARNING! No exception handling!</p> > * > * @param inStream input stream > * @param fileName name of the file to be generated > * > * @return <code>File</code> object associated with the generated file > * > * @throws Exception > */ > private File streamToFile(InputStream inStream, String fileName) > throws Exception > { > /* Data will be read from input stream in chunks of size e.g. > 4KB. */ > final int chunkSize = 4096; > byte[] byteArray = new byte[chunkSize]; > > /* Open the stream for buffered reading. */ > InputStream bufInStream = new BufferedInputStream(inStream); > > /* Create an empty file (if the file already exists, it will be > left > untouched) > to store the supplied bitstream... */ > File file = new File(fileName); > file.createNewFile(); > /* ...and associate a buffered output stream with it. */ > OutputStream bufOutStream = new BufferedOutputStream(new > FileOutputStream(file)); > > /* Copy data from input stream to newly generated file. */ > int readBytes = -1; > while ((readBytes = bufInStream.read(byteArray, 0, chunkSize)) > != -1) > bufOutStream.write(byteArray, 0, readBytes); > > /* Stop transactions to the file system... */ > bufOutStream.close(); > /* ...and return result. */ > return file; > } > > /** > * Produce input stream from a given file on the file system. > * <p>WARNING! No exception handling!</p> > * > * @param file <code>File</code> object associated with the given file > * > * @return input stream containing the data read from file > * > [EMAIL PROTECTED] Exception > */ > private InputStream fileToStream(File file, boolean verbose) throws > Exception > { > /* Open the stream for reading. */ > InputStream inStream = new FileInputStream(file); > > /* Allocate necessary memory for data buffer. */ > byte[] byteArray = new byte[(int)file.length()]; > > /* Load file contents into buffer. */ > inStream.read(byteArray); > > /* And imediately close transactions with the file system. */ > inStream.close(); > > /* If required to send the retrieved data to standard output... > */ > if (verbose) > { > /* Open the file again, but this tim handle it as a > character stream... */ > BufferedReader bufReader = new BufferedReader(new > FileReader(file)); > /* ...then print its contents line by line to the > standard output... */ > String lineOfText = null; > while ((lineOfText = bufReader.readLine()) != null) > System.out.println(lineOfText); > /* ...and close connection to the file. */ > bufReader.close(); > } > > /* Finally, generate and return input stream containing desired > data. */ > return new ByteArrayInputStream(byteArray); > } > } > > --------------------------------End of source > code------------------------------------ > > Please, excuse me for my poor English, and superfluous verbosity! > > Best wishes! > > Ivan Penev > _______________________________________________ > Dspace-general mailing list > [EMAIL PROTECTED] > http://mailman.mit.edu/mailman/listinfo/dspace-general > > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

