Hi Gabriel,

The email you saw was mine. After that I did some research on the
subject and could solve it for my case at least. The fact here is that
DSpace currently doesn't support indexing protected PDF documents. I
think the main issue is about how to manage and retrieve the password,
which sometimes the person submitting the file doesn't even have.
Theoretically you do need the password used to protect the PDF to be
able to extract the text from it.

The error message you saw is simply from the missing bouncycastle jar
library, which is corrected by downloading the jarfile from
http://www.bouncycastle.org/latest_releases.html (bcprov and bcmail).
Another issue is that the current version of PDFBox (0.7.3) actually has
a bug that doesn't allow any "unprotetion". I warned the author and
version 0.7.4 (which is still in development) has this fixed, so you'll
need to get a nightly build of PDFBox jar library from here
http://www.pdfbox.org/dist/.

To actually allow the indexing you'll have to dive a little into the
PDFBox API, but it's pretty easy, needing a minor change of
PDFFilter.java class. The only thing I can't help you is how you'll
obtain the password. In my case I can deduce it from the metadata of the
item, so everything runs smoothly. I suppose it could be a fixed
password for all files, which would solve everything, but with "less"
security (which is also debatable, as if protecting a PDF does add any
security at all. We think it _discourages_ copy and paste, so it is
worth the trouble.) but that highly depends on how the files are
generated.

Any more questions you can mail me directly.

Regards,
Afonso Araujo Neto




-----Mensagem original-----
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Em nome de Gabriel
Farrell
Enviada em: segunda-feira, 29 de janeiro de 2007 15:55
Para: [email protected]
Assunto: [Dspace-tech] filter-media error: bouncycastle

On running /dspace/bin/filter-media, I get the following error:

Exception in thread "main" java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
        at
org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:905)
        ...

Since Bouncy Castle [1] is a crypto library I'm assuming pdfbox is
trying to use it to
read protected documents.  Searching the lists, I found one post [2]
that suggested
passwords would be needed for Bouncy Castle to work.  Is there a system
in place for 
this?  Also, should Bouncy Castle get a mention in the requirements
section of the 
install docs?

Gabe

[1] http://bouncycastle.org/
[2] http://sourceforge.net/mailarchive/message.php?msg_id=37612154

-- 
Gabriel Farrell
Library Systems Developer
Hagerty Library
Drexel University
[EMAIL PROTECTED]
+1 215 895 1871

------------------------------------------------------------------------
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
V
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to