On Mon, 2007-01-29 at 17:07 -0200, Afonso Comba de Araujo Neto wrote:
> Hi Gabriel,
> 
> The email you saw was mine. After that I did some research on the
> subject and could solve it for my case at least. The fact here is that
> DSpace currently doesn't support indexing protected PDF documents. I
> think the main issue is about how to manage and retrieve the password,
> which sometimes the person submitting the file doesn't even have.
> Theoretically you do need the password used to protect the PDF to be
> able to extract the text from it.
> 
> The error message you saw is simply from the missing bouncycastle jar
> library, which is corrected by downloading the jarfile from
> http://www.bouncycastle.org/latest_releases.html (bcprov and bcmail).
> Another issue is that the current version of PDFBox (0.7.3) actually has
> a bug that doesn't allow any "unprotetion". I warned the author and
> version 0.7.4 (which is still in development) has this fixed, so you'll
> need to get a nightly build of PDFBox jar library from here
> http://www.pdfbox.org/dist/.
> 
> To actually allow the indexing you'll have to dive a little into the
> PDFBox API, but it's pretty easy, needing a minor change of
> PDFFilter.java class. The only thing I can't help you is how you'll
> obtain the password. In my case I can deduce it from the metadata of the
> item, so everything runs smoothly. I suppose it could be a fixed
> password for all files, which would solve everything, but with "less"
> security (which is also debatable, as if protecting a PDF does add any
> security at all. We think it _discourages_ copy and paste, so it is
> worth the trouble.) but that highly depends on how the files are
> generated.
> 
> Any more questions you can mail me directly.
> 
> Regards,
> Afonso Araujo Neto
> 
> 

Afonso, thanks for the reply.  We don't have a lot of encrypted PDFs, so
I just did what was necessary to get filter-media running again:
1. Downloaded bcprov-jdk15-135.jar and bcmail-jdk15-135.jar from
http://bouncycastle.org/latest_releases.html and dropped them
into /usr/local/dspace/lib/
2. Recompiled DSpace

Now /dspace/bin/filter-media runs without a hitch.  Should these jar
files be added to the DSpace 1.4.2 release?  Or at least a mention in
the documentation?

Gabe


> 
> 
> -----Mensagem original-----
> De: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Em nome de Gabriel
> Farrell
> Enviada em: segunda-feira, 29 de janeiro de 2007 15:55
> Para: [email protected]
> Assunto: [Dspace-tech] filter-media error: bouncycastle
> 
> On running /dspace/bin/filter-media, I get the following error:
> 
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>         at
> org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:905)
>       ...
> 
> Since Bouncy Castle [1] is a crypto library I'm assuming pdfbox is
> trying to use it to
> read protected documents.  Searching the lists, I found one post [2]
> that suggested
> passwords would be needed for Bouncy Castle to work.  Is there a system
> in place for 
> this?  Also, should Bouncy Castle get a mention in the requirements
> section of the 
> install docs?
> 
> Gabe
> 
> [1] http://bouncycastle.org/
> [2] http://sourceforge.net/mailarchive/message.php?msg_id=37612154
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to