Hello,
I just try to get the XPDF based PDF Thumbnail creation working. It works fine
in my DSpace 4.1 test instance.
The feature was already available in DSpace 1.8.2 which is still our production
release. Instead of waiting until the new version is production ready, I
install the features step by step in the production environment.
On the production machine, I get this error:
esxh-15:/srv/dspace# bin/dspace filter-media -i 2339/4318 -v
The following MediaFilters are enabled:
Full Filter Name: org.dspace.app.mediafilter.HTMLFilter
org.dspace.app.mediafilter.HTMLFilter
Full Filter Name: org.dspace.app.mediafilter.WordFilter
org.dspace.app.mediafilter.WordFilter
Full Filter Name: org.dspace.app.mediafilter.JPEGFilter
org.dspace.app.mediafilter.JPEGFilter
Full Filter Name: org.dspace.app.mediafilter.XPDF2Text
org.dspace.app.mediafilter.XPDF2Text
Full Filter Name: org.dspace.app.mediafilter.XPDF2Thumbnail
org.dspace.app.mediafilter.XPDF2Thumbnail
Full Filter Name: org.dspace.app.mediafilter.PowerPointFilter
org.dspace.app.mediafilter.PowerPointFilter
SKIPPED: bitstream 27442 (item: 2339/4318) because 'Limmerstraße.pdf.txt'
already exists
ERROR filtering, skipping bitstream:
Item Handle: 2339/4318
Bundle Name: ORIGINAL
File Size: 2667225
Checksum: 3db0096cb62b6d595c1e4bb77f6833d0 (MD5)
Asset Store: 0
javax.imageio.IIOException: Can't read input file!
javax.imageio.IIOException: Can't read input file!
at javax.imageio.ImageIO.read(ImageIO.java:1291)
at
org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:244)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
Updating search index:
Note, that the text extraction took place in an earlier run of filter-media. So
the message "Can't read input file!" is not very credible. Also the method
called when the Exeption took place was XPDF2Thumbnail.getDestinationStream,
which means that this issue might not be with the input file but with creating
the output file.
In 2012, Osama Alkadi reported a similar issue and solved it by updating the
pdftoppm tool. On Debian and Ubuntu, the required tools are contained in the
package poppler-utils. I have installed Version 0.18.4 on both test and
production machine. Here is the output:
esxh-15:/srv/dspace# pdftoppm -v
pdftoppm version 0.18.4
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC
The version numberings seems to have changed in unexpected ways as Osama Alkadi
told that he updated from 3.0 to 3.0.2. For the moment, this does not help too
much.
All other components involved are also the same on both machines. jai_imageio
is version 1.1 and jai_core is 1.1.3.
As the file is hard to find in the assetstore, I downloaded it using the
browser, scped it back to the server and converted it manually using pdftoppm
-jpeg inputfile.pdf outputname. It worked.
I exported the item containing the file using the AIP packager, transferred it
to the test server running DSpace 4.1, imported it and ran filter-media there.
It worked fine.
I compared the installation instructions of DSpace 4.1 and 1.8.2 and could not
find a significant difference regarding the XPDF Feature. The mvn package and
ant update command had not shown any irregularities.
File permissions in assetstore did not show any problems. On both machines,
DSpace is run as the daemon user tomcat7. In both cases, I run Tomcat 7, albeit
in slightly different versions. But Tomcat is not involved in running the
command line tool like bin/dspace filter-media anyway.
So far, I have not found a clue, where to search for the reason. If anybody has
an idea, Id be grateful.
Bye, Christian
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette