Hello -

The errors you posted appear to be related to filtering of PDF and Word 
documents.  I'm not sure of the limits.

For what it's worth, here are some steps you can try to build JPEG 
thumbnails of TIFFs in DSpace:

- Enter TIFF in the DSpace bitstream format registry with mime-type 
image/tiff and file extensions tiff and tif

- Edit the dspace.cfg file and add image/tiff and TIFF to the 
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats line

 - Download and install the Java Advanced Imaging I/O tools, currently 
available here:
   https://jai-imageio.dev.java.net/binary-builds.html .  These tools 
contain a TIFF plugin that will allow the JPEGFilter to read the TIFF
   format.

 - Verify that the bitstreams are marked as TIFF format, and then run 
the filter-media script to build the JPEG thumbnails for the TIFFs.  Be 
patient.  If you see memory errors with large TIFF files, you can try 
increasing the "-Xmx256m"  (maximum heap size) parameter in the dsrun 
script to resolve the problem.  

If you have certain types of images, you may need to write a custom 
filter or modify the JPEGFilter to get better results. For example, if 
you have large TIFF files that are primarily black and white, the 
JPEGFilter will favor speed over appearance when resampling the image to 
the thumbnail sized JPEG, and the resulting thumbnail won't look much 
like the original.  You might need a filter that uses a different 
resampling method.

-- Keith
Systems Developer
OhioLINK


Branko Kovacevic wrote:
> Dear All,
>
> So far we've been uploading jpg images into our DSpace system and had
> no problems with getting thumbnails for them later.
>
> Unfortunately, recently after uploading a dozen of items with tiff
> images (their size is between 4 and 15 Mb)  couldn't  get thumbnails for
> them. Filter-media script returns error message. Here is the portion of
> the log file, with  some critical messages:
>
> ERROR filtering, skipping bitstream #7542
> java.io.FileNotFoundException: no such entry: "0Table"
> java.io.FileNotFoundException: no such entry: "0Table"
>    at
> org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java 
> :283)
>    at
> org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:60)
>    at
> org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:97)
>    at
> org.dspace.app.mediafilter.MediaFilter.processBitstream 
> (MediaFilter.java:155)
>    at
> org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:327)
>    at
> org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:296)
>  
>
>    at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:266)
>    at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:234)
>    at
> org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:185)
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at gnu.gcj.runtime.FinalizerThread.run (libgcj.so.70)
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
> java.lang.Throwable : Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
> java.lang.Throwable: Warning: You did not close the PDF Document
>    at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
>    at gnu.gcj.runtime.FinalizerThread.run(libgcj.so.70)
> FILTERED: bitstream 7682 and created
> 'articles_bridging_20000615.pdf.txt'
> FILTERED: bitstream 7683 and created
> 'articles_sustainable_developement_20000815.pdf.txt'
> GC Warning: Repeated allocation of very large block (appr. size
> 20230144):
>         May lead to memory leak and poor performance.
> FILTERED: bitstream 7684 and created
> 'articles_venture_20001215.pdf.txt'
> FILTERED: bitstream 7685 and created
> 'articles_rethinking_20010215.pdf.txt'
> FILTERED: bitstream 7686 and created
> 'articles_relationship_20010515.pdf.txt'
> FILTERED: bitstream 7687 and created
> 'articles_org_capacity_20021115.pdf.txt'
> GC Warning: Out of Memory!  Returning NIL!
> Exception in thread "main" java.lang.OutOfMemoryError
>    <<No stacktrace available>>
>
> Is there any limit of the file size filtering?
> Any help is highly appreciated.
>
> Best regards,
> Branko Kovacevic
>
> Records Coordinator
> Open Society Archives
> Arany Janos u. 32
> 1051 Budapest, Hungary
> phone: (36-1) 327-3266  or 327-2029
> e-mail: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
> website: www.osa.ceu.hu <http://www.osa.ceu.hu>
> ++++++++++++++++++++++++++++
>
>



-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to