Hello,

I just try to get the XPDF based PDF Thumbnail creation working. It works fine 
in my DSpace 4.1 test instance. 

The feature was already available in DSpace 1.8.2 which is still our production 
release. Instead of waiting until the new version is production ready, I 
install the features step by step in the production environment.


On the production machine, I get this error:

esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v
The following MediaFilters are enabled: 
Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
org.dspace.app.mediafilter.HTMLFilter
Full Filter Name: org.dspace.app.mediafilter.WordFilter
org.dspace.app.mediafilter.WordFilter
Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
org.dspace.app.mediafilter.JPEGFilter
Full Filter Name: org.dspace.app.mediafilter.XPDF2Text
org.dspace.app.mediafilter.XPDF2Text
Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail
org.dspace.app.mediafilter.XPDF2Thumbnail
Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
org.dspace.app.mediafilter.PowerPointFilter
SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt' 
already exists
ERROR filtering, skipping bitstream:

        Item Handle: 2339/4318
        Bundle Name: ORIGINAL
        File Size: 2667225
        Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5)
        Asset Store: 0
javax.imageio.IIOException: Can't read input file!
javax.imageio.IIOException: Can't read input file!
        at javax.imageio.ImageIO.read(ImageIO.java:1291)
        at 
org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244)
        at 
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
        at 
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
        at 
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
        at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
        at 
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
Updating search index:


Note, that the text extraction took place in an earlier run of filter-media. So 
the message "Can't read input file!" is not very credible. Also the method 
called when the Exeption took place was XPDF2Thumbnail.getDestinationStream, 
which means that this issue might not be with the input file but with creating 
the output file.


In 2012, Osama Alkadi reported a similar issue and solved it by updating the 
pdftoppm tool. On Debian and Ubuntu, the required tools are contained in the 
package poppler-utils. I have installed Version 0.18.4 on both test and 
production machine. Here is the output:

esxh-15:/srv/dspace# pdftoppm -v
pdftoppm version 0.18.4
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC

The version numberings seems to have changed in unexpected ways as Osama Alkadi 
told that he updated from 3.0 to 3.0.2. For the moment, this does not help too 
much.

All other components involved are also the same on both machines. jai_imageio 
is version 1.1 and jai_core is 1.1.3.


As the file is hard to find in the assetstore, I downloaded it using the 
browser, scped it back to the server and converted it manually using pdftoppm 
-jpeg inputfile.pdf outputname. It worked.

I exported the item containing the file using the AIP packager, transferred it 
to the test server running DSpace 4.1, imported it and ran filter-media there. 
It worked fine.

I compared the installation instructions of DSpace 4.1 and 1.8.2 and could not 
find a significant difference regarding the XPDF Feature. The mvn package and 
ant update command had not shown any irregularities.

File permissions in assetstore did not show any problems. On both machines, 
DSpace is run as the daemon user tomcat7. In both cases, I run Tomcat 7, albeit 
in slightly different versions. But Tomcat is not involved in running the 
command line tool like bin/dspace filter-media anyway.

So far, I have not found a clue, where to search for the reason. If anybody has 
an idea, Id be grateful.

Bye, Christian


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to