On Mon, 2007-01-29 at 17:07 -0200, Afonso Comba de Araujo Neto wrote: > Hi Gabriel, > > The email you saw was mine. After that I did some research on the > subject and could solve it for my case at least. The fact here is that > DSpace currently doesn't support indexing protected PDF documents. I > think the main issue is about how to manage and retrieve the password, > which sometimes the person submitting the file doesn't even have. > Theoretically you do need the password used to protect the PDF to be > able to extract the text from it. > > The error message you saw is simply from the missing bouncycastle jar > library, which is corrected by downloading the jarfile from > http://www.bouncycastle.org/latest_releases.html (bcprov and bcmail). > Another issue is that the current version of PDFBox (0.7.3) actually has > a bug that doesn't allow any "unprotetion". I warned the author and > version 0.7.4 (which is still in development) has this fixed, so you'll > need to get a nightly build of PDFBox jar library from here > http://www.pdfbox.org/dist/. > > To actually allow the indexing you'll have to dive a little into the > PDFBox API, but it's pretty easy, needing a minor change of > PDFFilter.java class. The only thing I can't help you is how you'll > obtain the password. In my case I can deduce it from the metadata of the > item, so everything runs smoothly. I suppose it could be a fixed > password for all files, which would solve everything, but with "less" > security (which is also debatable, as if protecting a PDF does add any > security at all. We think it _discourages_ copy and paste, so it is > worth the trouble.) but that highly depends on how the files are > generated. > > Any more questions you can mail me directly. > > Regards, > Afonso Araujo Neto > >
Afonso, thanks for the reply. We don't have a lot of encrypted PDFs, so I just did what was necessary to get filter-media running again: 1. Downloaded bcprov-jdk15-135.jar and bcmail-jdk15-135.jar from http://bouncycastle.org/latest_releases.html and dropped them into /usr/local/dspace/lib/ 2. Recompiled DSpace Now /dspace/bin/filter-media runs without a hitch. Should these jar files be added to the DSpace 1.4.2 release? Or at least a mention in the documentation? Gabe > > > -----Mensagem original----- > De: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Em nome de Gabriel > Farrell > Enviada em: segunda-feira, 29 de janeiro de 2007 15:55 > Para: [email protected] > Assunto: [Dspace-tech] filter-media error: bouncycastle > > On running /dspace/bin/filter-media, I get the following error: > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/bouncycastle/jce/provider/BouncyCastleProvider > at > org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:905) > ... > > Since Bouncy Castle [1] is a crypto library I'm assuming pdfbox is > trying to use it to > read protected documents. Searching the lists, I found one post [2] > that suggested > passwords would be needed for Bouncy Castle to work. Is there a system > in place for > this? Also, should Bouncy Castle get a mention in the requirements > section of the > install docs? > > Gabe > > [1] http://bouncycastle.org/ > [2] http://sourceforge.net/mailarchive/message.php?msg_id=37612154 > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

