(Apologies for the length of this message. I want to be sure I include
all the relevant information)
SUMMARY
I am trying to implement XPDF-based media filters in a DSpace 1.7.2
repository. I have installed the XPDF tool suite, and I've located the
jai_imageio and jai_core JAR files and installed them in my local Maven
repository. I have built DSpace with the "-Pxpdf-mediafilter-support"
option. However, "dspace filter-media" fails when attempting to filter
a PDF in our repository:
ERROR filtering, skipping bitstream:
Item Handle: 192837465/41
Bundle Name: ORIGINAL
File Size: 87931
Checksum: 9ebad8ee9bc7d35238afadffb391531b (MD5)
Asset Store: 0
java.io.IOException: Unknown failure while transforming file to preview:
no image produced.
java.io.IOException: Unknown failure while transforming file to preview:
no image produced.
at
org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:274)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:414)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:333)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
I have searched the DSpace-tech archives thoroughly and have not found a
solution, so I'm starting a new topic hoping to find the answer. See
below for details.
* DSpace*
I'm running DSpace 1.7.2 on RHEL5 in a VMWare virtual machine. Since
this is a proof-of-concept repository, I'm running everything under my
own user account (including Tomcat).
* XPDF *
I'm following the directions documented in the DSPace wiki here:
https://wiki.duraspace.org/display/DSDOC/Configuration#Configuration-XPDFFilter.
I downloaded precompiled binaries for XPDF v3.02pl6 and installed them
without problems. I have verified that all the command-line tools work
correctly.
* Java Advanced Imaging libraries *
Since the documented link to the "jai_imageio" library is broken, I
searched for the JAR file and found
"jai_imageio-1_1-lib-linux-i586-jar.zip" here:
http://download.java.net/media/jai-imageio/builds/release/1.1/. I
successfully installed it in my local Maven repository.
I found "jai_core" version 1.1.3-alpha here:
http://www.findjar.com/jar/geoserver/jai/jars/jai-core-1.1.3-alpha.jar.html.
I successfully installed it in my local Maven repository as well.
* Maven dependencies *
I edited "[dspace-source]/dspace/pom.xml" so it would correctly resolve
the dependencies to the versions of the libraries I downloaded:
<profile>
<id>xpdf-mediafilter-support</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>javax.media</groupId>
<artifactId>jai_core</artifactId>
<version>1.1.3-alpha</version>
</dependency>
</dependencies>
</profile>
* DSpace configuration *
I edited my "[dspace-source]/config/dspace.cfg" to make the
configuration changes required to use XPDF:
# maximum width and height of generated thumbnails
thumbnail.maxwidth = 80
thumbnail.maxheight = 80
# XPDF Media Filter executables
xpdf.path.pdftotext = /home/sthursto/devtools/xpdf-3.02pl6-linux/pdftotext
xpdf.path.pdftoppm = /home/sthursto/devtools/xpdf-3.02pl6-linux/pdftoppm
xpdf.path.pdfinfo = /home/sthursto/devtools/xpdf-3.02pl6-linux/pdfinfo
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \
org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \
org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text
Extractor, \
org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview
JPEG
filter.org.dspace.app.mediafilter.XPDF2Text.inputFormats = Adobe PDF
filter.org.dspace.app.mediafilter.XPDF2Thumbnail.inputFormats = Adobe PDF
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats = HTML, Text
filter.org.dspace.app.mediafilter.WordFilter.inputFormats = Microsoft Word
filter.org.dspace.app.mediafilter.PowerPointFilter.inputFormats =
Microsoft Powerpoint, Microsoft Powerpoint XML
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats = BMP, GIF,
JPEG, image/png
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats
= BMP, GIF, JPEG, image/png
* Build and update DSpace *
I build and updated DSpace as follows:
cd [dspace-source]/dspace
mvn -Pxpdf-mediafilter-support -Ddb.name=oracle package
cd target/dspace-1.7.2-build.dir
ant update
I restarted Tomcat and tried running the media filters, but the job
failed with the error documented above.
FURTHER DEBUG
1. I tried enabling DEBUG logging in DSpace, but no useful messages were
produced when I ran "dspace filter-media":
2011-07-15 12:03:54,633 INFO org.dspace.core.ConfigurationManager @
Loading from classloader: file:/home/sthursto/dspace/config/dspace.cfg
2011-07-15 12:03:54,657 INFO org.dspace.core.ConfigurationManager @
Using dspace provided log configuration (log.init.config)
2011-07-15 12:03:54,657 INFO org.dspace.core.ConfigurationManager @
Loading: /home/sthursto/dspace/config/log4j.properties
2011-07-15 12:03:55,594 DEBUG net.sf.ehcache.config.ConfigurationFactory
@ Configuring ehcache from InputStream
2011-07-15 12:03:55,675 DEBUG net.sf.ehcache.config.BeanHandler @
Ignoring ehcache attribute xmlns:xsi
2011-07-15 12:03:55,675 DEBUG net.sf.ehcache.config.BeanHandler @
Ignoring ehcache attribute xsi:noNamespaceSchemaLocation
2011-07-15 12:03:55,676 DEBUG
net.sf.ehcache.config.DiskStoreConfiguration @ Disk Store Path: /tmp
2011-07-15 12:03:55,684 DEBUG net.sf.ehcache.config.ConfigurationHelper
@ No CacheManagerEventListenerFactory class specified. Skipping...
2011-07-15 12:03:55,692 DEBUG net.sf.ehcache.config.ConfigurationHelper
@ No BootstrapCacheLoaderFactory class specified. Skipping...
2011-07-15 12:03:55,692 DEBUG net.sf.ehcache.config.ConfigurationHelper
@ No CacheExceptionHandlerFactory class specified. Skipping...
2011-07-15 12:03:55,762 DEBUG net.sf.ehcache.util.UpdateChecker @
Checking for update...
2011-07-15 12:03:56,262 INFO net.sf.ehcache.util.UpdateChecker @ New
update(s) found: 2.4.3 [2]
2011-07-15 12:03:56,825 INFO org.dspace.search.DSIndexer @ Writing
Collection: 192837465/2 to Index
2011-07-15 12:03:56,862 INFO org.dspace.search.DSIndexer @ Writing
Community: 192837465/1 to Index
2011-07-15 12:03:57,104 DEBUG net.sf.ehcache.CacheManager @ CacheManager
already shutdown
2011-07-15 12:03:57,105 ERROR org.dspace.kernel.DSpaceKernelManager @
WARN Failed to unregister the MBean:
org.dspace:name=abc3e6e6-c56a-402d-86d9-d260ae182226,type=DSpaceKernel
(I don't see any mention of running XPDF tools)
2. I found a post in the archives that said there was a missing
dependency in "[dspace-source]/dspace-api/pom.xml", so I added it and
rebuild DSpace:
<!-- Added to resolve XPDF filter errors -->
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>
This had no effect whatsoever.
So, now I seem to be stuck. I'd appreciate any suggestions you could offer.
--
Scott Thurston [email protected]
NOAA / NGDC / WDC http://www.ngdc.noaa.gov/
Marine Geology& Geophysics 303-497-4411 (phone)
325 Broadway E/GC3 303-497-6513 (fax)
Boulder, CO 80305-3337
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech