Hi Gabriel, The email you saw was mine. After that I did some research on the subject and could solve it for my case at least. The fact here is that DSpace currently doesn't support indexing protected PDF documents. I think the main issue is about how to manage and retrieve the password, which sometimes the person submitting the file doesn't even have. Theoretically you do need the password used to protect the PDF to be able to extract the text from it.
The error message you saw is simply from the missing bouncycastle jar library, which is corrected by downloading the jarfile from http://www.bouncycastle.org/latest_releases.html (bcprov and bcmail). Another issue is that the current version of PDFBox (0.7.3) actually has a bug that doesn't allow any "unprotetion". I warned the author and version 0.7.4 (which is still in development) has this fixed, so you'll need to get a nightly build of PDFBox jar library from here http://www.pdfbox.org/dist/. To actually allow the indexing you'll have to dive a little into the PDFBox API, but it's pretty easy, needing a minor change of PDFFilter.java class. The only thing I can't help you is how you'll obtain the password. In my case I can deduce it from the metadata of the item, so everything runs smoothly. I suppose it could be a fixed password for all files, which would solve everything, but with "less" security (which is also debatable, as if protecting a PDF does add any security at all. We think it _discourages_ copy and paste, so it is worth the trouble.) but that highly depends on how the files are generated. Any more questions you can mail me directly. Regards, Afonso Araujo Neto -----Mensagem original----- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em nome de Gabriel Farrell Enviada em: segunda-feira, 29 de janeiro de 2007 15:55 Para: [email protected] Assunto: [Dspace-tech] filter-media error: bouncycastle On running /dspace/bin/filter-media, I get the following error: Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:905) ... Since Bouncy Castle [1] is a crypto library I'm assuming pdfbox is trying to use it to read protected documents. Searching the lists, I found one post [2] that suggested passwords would be needed for Bouncy Castle to work. Is there a system in place for this? Also, should Bouncy Castle get a mention in the requirements section of the install docs? Gabe [1] http://bouncycastle.org/ [2] http://sourceforge.net/mailarchive/message.php?msg_id=37612154 -- Gabriel Farrell Library Systems Developer Hagerty Library Drexel University [EMAIL PROTECTED] +1 215 895 1871 ------------------------------------------------------------------------ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE V _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

