Hi...

Last week I wrote to the list about a strange problem that I was having using 
xpdf-3.02 with my filter-media with my DSpace 1.6.2 instance.  Here is the 
final update for anyone who might be interested.

The problem was that it worked fine on my test server, but text extracting on 
production was failing with the message:
java.io.IOException: pdftotext failed, maybe corrupt PDF? status=9

My test and production machines are virtually mirrors of each other when it 
comes to setup.

I tried reinstalling xpdf on my production machine, but I still couldn't get 
the pdftotext to function properly.  In desperation (because I had a lot of 
recent PDFs that needed to be indexed), I went back to using PDFBox in my 
filter-media, and everything is working fine now.

I the end, I have no idea why  xpdf would not work on my production machine, 
but for now my problem is fixed.

George Kozak
Digital Library Specialist
Cornell University Library Information Technologies (CUL-IT)
501 Olin Library
Cornell University
Ithaca, NY 14853
607-255-8924

From: George Stanley Kozak
Sent: Friday, February 11, 2011 10:22 AM
To: dspace-tech@lists.sourceforge.net
Subject: Strange problem with xpdf

Hi...

I am using xpdf-3.02 with my filter-media with my DSpace 1.6.2 instance.

On my test server, running filter-media works fine.  On my production server, I 
have discovered that the pdftotext is failing with:
java.io.IOException: pdftotext failed, maybe corrupt PDF? status=9
java.io.IOException: pdftotext failed, maybe corrupt PDF? status=9
        at 
org.dspace.app.mediafilter.XPDF2Text.getDestinationStream(XPDF2Text.java:159)

The same PDFs that can be filtered on the Test Server, do not filter on the 
Production Server.

I have checked the xpdf binaries and they are correct (I even recompiled them 
on Production).  The libraries seem to be correct.

Does anyone have any ideas as to why this would work on my test instance and 
not on my production instance?

By the way, I built my instance using "mvn -Pxpdf-mediafilter-support -U clean 
package"

George Kozak
Digital Library Specialist
Cornell University Library Information Technologies (CUL-IT)
501 Olin Library
Cornell University
Ithaca, NY 14853
607-255-8924

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to