Jeff,
Why don't you just open the offending .pdf in Adobe and then do as
"Save-As" as a .txt file. You can then import it back into DSpace. Just make
sure in your "contents" file, you specify that this document is to imported
into the TEXT bundle, NOT the ORIGINAL bundle. You can then run index-all and
your previously unfilterable/non-searchable document will be full-text
searchable!
Sue
________________________________
From: Jeffrey Trimble [mailto:[email protected]]
Sent: Wednesday, April 08, 2009 9:36 AM
To: DSpace Tech
Subject: [Dspace-tech] Java Heap dumps during Filter-Media
I've run into a funky situation. After using the distributed PDFBOX....and
the associated jars (bouncy castle) the filter media works really, really well,
until--
We have one pdf that has caused the filter-media to produce a memory dump/
java heap dump. The errors are reports first the IBM flavor of JVM. We
removed
the offending PDF from the database, the filter-media went on it's way merrily.
Has anyone seen anything like this? I have a copy of the heap dump and trace.
I can
reproduce it one demand by placing this PDF back into the IR.
If you have seen this, and was able to resolve it, please let me know. The
only thing
I can think of doing is to rescan the PDF file from the original and seeing if
there
is something that resovles itself with the new scan.
Thanks in advance,
Jeffrey Trimble
System LIbrarian
William F. Maag Library
Youngstown State University
330.941.2483 (Office)
[email protected]<mailto:[email protected]>
http://www.maag.ysu.edu
http://digital.maag.ysu.edu
------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech