(Apologies for the length of this message. I want to be sure I include 
all the relevant information)

SUMMARY
I am trying to implement XPDF-based media filters in a DSpace 1.7.2 
repository.  I have installed the XPDF tool suite, and I've located the 
jai_imageio and jai_core JAR files and installed them in my local Maven 
repository.  I have built DSpace with the "-Pxpdf-mediafilter-support" 
option.  However, "dspace filter-media" fails when attempting to filter 
a PDF in our repository:

ERROR filtering, skipping bitstream:

         Item Handle: 192837465/41
         Bundle Name: ORIGINAL
         File Size: 87931
         Checksum: 9ebad8ee9bc7d35238afadffb391531b (MD5)
         Asset Store: 0
java.io.IOException: Unknown failure while transforming file to preview: 
no image produced.
java.io.IOException: Unknown failure while transforming file to preview: 
no image produced.
         at 
org.dspace.app.mediafilter.XPDF2Thumbnail.getDestinationStream(XPDF2Thumbnail.java:274)
         at 
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737)
         at 
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561)
         at 
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511)
         at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479)
         at 
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:414)
         at 
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:333)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)

I have searched the DSpace-tech archives thoroughly and have not found a 
solution, so I'm starting a new topic hoping to find the answer.  See 
below for details.

* DSpace*
I'm running DSpace 1.7.2 on RHEL5 in a VMWare virtual machine. Since 
this is a proof-of-concept repository, I'm running everything under my 
own user account (including Tomcat).

* XPDF *
I'm following the directions documented in the DSPace wiki here: 
https://wiki.duraspace.org/display/DSDOC/Configuration#Configuration-XPDFFilter.

I downloaded precompiled binaries for XPDF v3.02pl6 and installed them 
without problems.  I have verified that all the command-line tools work 
correctly.

* Java Advanced Imaging libraries *
Since the documented link to the "jai_imageio" library is broken, I 
searched for the JAR file and found 
"jai_imageio-1_1-lib-linux-i586-jar.zip" here: 
http://download.java.net/media/jai-imageio/builds/release/1.1/.  I 
successfully installed it in my local Maven repository.

I found "jai_core" version 1.1.3-alpha here: 
http://www.findjar.com/jar/geoserver/jai/jars/jai-core-1.1.3-alpha.jar.html. 
I successfully installed it in my local Maven repository as well.

* Maven dependencies *
I edited "[dspace-source]/dspace/pom.xml" so it would correctly resolve 
the dependencies to the versions of the libraries I downloaded:

<profile>
<id>xpdf-mediafilter-support</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>javax.media</groupId>
<artifactId>jai_core</artifactId>
<version>1.1.3-alpha</version>
</dependency>
</dependencies>
</profile>

* DSpace configuration *
I edited my "[dspace-source]/config/dspace.cfg" to make the 
configuration changes required to use XPDF:

# maximum width and height of generated thumbnails
thumbnail.maxwidth  = 80
thumbnail.maxheight = 80

# XPDF Media Filter executables
xpdf.path.pdftotext = /home/sthursto/devtools/xpdf-3.02pl6-linux/pdftotext
xpdf.path.pdftoppm = /home/sthursto/devtools/xpdf-3.02pl6-linux/pdftoppm
xpdf.path.pdfinfo = /home/sthursto/devtools/xpdf-3.02pl6-linux/pdfinfo

plugin.named.org.dspace.app.mediafilter.FormatFilter = \
   org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \
   org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \
   org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
   org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
   org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text 
Extractor, \
   org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
   org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview 
JPEG
filter.org.dspace.app.mediafilter.XPDF2Text.inputFormats = Adobe PDF
filter.org.dspace.app.mediafilter.XPDF2Thumbnail.inputFormats = Adobe PDF
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats = HTML, Text
filter.org.dspace.app.mediafilter.WordFilter.inputFormats = Microsoft Word
filter.org.dspace.app.mediafilter.PowerPointFilter.inputFormats = 
Microsoft Powerpoint, Microsoft Powerpoint XML
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats = BMP, GIF, 
JPEG, image/png
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats 
= BMP, GIF, JPEG, image/png

* Build and update DSpace *
I build and updated DSpace as follows:

cd [dspace-source]/dspace
mvn -Pxpdf-mediafilter-support -Ddb.name=oracle package
cd target/dspace-1.7.2-build.dir
ant update

I restarted Tomcat and tried running the media filters, but the job 
failed with the error documented above.

FURTHER DEBUG
1. I tried enabling DEBUG logging in DSpace, but no useful messages were 
produced when I ran "dspace filter-media":

2011-07-15 12:03:54,633 INFO  org.dspace.core.ConfigurationManager @ 
Loading from classloader: file:/home/sthursto/dspace/config/dspace.cfg
2011-07-15 12:03:54,657 INFO  org.dspace.core.ConfigurationManager @ 
Using dspace provided log configuration (log.init.config)
2011-07-15 12:03:54,657 INFO  org.dspace.core.ConfigurationManager @ 
Loading: /home/sthursto/dspace/config/log4j.properties
2011-07-15 12:03:55,594 DEBUG net.sf.ehcache.config.ConfigurationFactory 
@ Configuring ehcache from InputStream
2011-07-15 12:03:55,675 DEBUG net.sf.ehcache.config.BeanHandler @ 
Ignoring ehcache attribute xmlns:xsi
2011-07-15 12:03:55,675 DEBUG net.sf.ehcache.config.BeanHandler @ 
Ignoring ehcache attribute xsi:noNamespaceSchemaLocation
2011-07-15 12:03:55,676 DEBUG 
net.sf.ehcache.config.DiskStoreConfiguration @ Disk Store Path: /tmp
2011-07-15 12:03:55,684 DEBUG net.sf.ehcache.config.ConfigurationHelper 
@ No CacheManagerEventListenerFactory class specified. Skipping...
2011-07-15 12:03:55,692 DEBUG net.sf.ehcache.config.ConfigurationHelper 
@ No BootstrapCacheLoaderFactory class specified. Skipping...
2011-07-15 12:03:55,692 DEBUG net.sf.ehcache.config.ConfigurationHelper 
@ No CacheExceptionHandlerFactory class specified. Skipping...
2011-07-15 12:03:55,762 DEBUG net.sf.ehcache.util.UpdateChecker @ 
Checking for update...
2011-07-15 12:03:56,262 INFO  net.sf.ehcache.util.UpdateChecker @ New 
update(s) found: 2.4.3 [2]
2011-07-15 12:03:56,825 INFO  org.dspace.search.DSIndexer @ Writing 
Collection: 192837465/2 to Index
2011-07-15 12:03:56,862 INFO  org.dspace.search.DSIndexer @ Writing 
Community: 192837465/1 to Index
2011-07-15 12:03:57,104 DEBUG net.sf.ehcache.CacheManager @ CacheManager 
already shutdown
2011-07-15 12:03:57,105 ERROR org.dspace.kernel.DSpaceKernelManager @ 
WARN Failed to unregister the MBean: 
org.dspace:name=abc3e6e6-c56a-402d-86d9-d260ae182226,type=DSpaceKernel

(I don't see any mention of running XPDF tools)

2. I found a post in the archives that said there was a missing 
dependency in "[dspace-source]/dspace-api/pom.xml", so I added it and 
rebuild DSpace:

<!-- Added to resolve XPDF filter errors -->
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>

This had no effect whatsoever.

So, now I seem to be stuck.  I'd appreciate any suggestions you could offer.

-- 
Scott Thurston                  [email protected]
NOAA / NGDC / WDC               http://www.ngdc.noaa.gov/
Marine Geology&  Geophysics     303-497-4411 (phone)
325 Broadway E/GC3              303-497-6513 (fax)
Boulder, CO 80305-3337


------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to