I found out something very interesting this weekend.  I took a .pdf file
that was "unfilterable"; in other words filter-media displayed an error
like this:

 "ERROR filtering, skipping bitstream #21220 java.io.IOException: Error:
value is not an integer type actual='--20'" 

 

On a hunch, I looked at the document and found it had several pages of
graphics/images in it.  I deleted all pages in the document, which
contained images and guess what?  It filtered just fine.

 

Hmmm...we have to be able to upload documents that contain images.  NASA
has a LOT of images in their documents.  Now what??

 

Sue Walker-Thornton

NASA Langley Research Center

(757) 224-4074

 

-----Original Message-----
From: Graham Triggs [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 24, 2008 3:13 PM
To: dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] filter-media problem - question on size limit

 

If anyone has example PDFs that cause the text extraction to fail 

(smaller PDFs preferably!) that they are able to share, please send them


- or a link to retrieve them - to me.

 

Thanks,

G

 

Mark H. Wood wrote:

> I found this:

> 

>    http://java-source.net/open-source/pdf-libraries

> 

> PJX and PDF Jester look, at first glance, as though they might be

> worth considering.

> 

> OTOH it looks like PDFBox might be getting more attention in its new

> home, and if so, then it makes sense to stick with it and help to

> improve it.

> 

> 

> 

>
------------------------------------------------------------------------

> 

>
------------------------------------------------------------------------
-

> This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge

> Build the coolest Linux based applications with Moblin SDK&  win great
prizes

> Grand prize is a trip for two to an Open Source event anywhere in the
world

> http://moblin-contest.org/redirect.php?banner_id=100&url=/

> 

> 

>
------------------------------------------------------------------------

> 

> _______________________________________________

> DSpace-tech mailing list

> DSpace-tech@lists.sourceforge.net

> https://lists.sourceforge.net/lists/listinfo/dspace-tech

 

This email has been scanned by Postini.

For more information please visit http://www.postini.com

 

 

------------------------------------------------------------------------
-

This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge

Build the coolest Linux based applications with Moblin SDK & win great
prizes

Grand prize is a trip for two to an Open Source event anywhere in the
world

http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________

DSpace-tech mailing list

DSpace-tech@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/dspace-tech

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to