Mark: I know for certain that we have protected PDFs in our archive, and they get indexed with no issue. This is using 1.4.2. After watching the job take place, I noticed that DSpace employs a function called bouncycastle which supports the text extraction of encrypted PDFs.
This note can be seen on the wiki for the release notes for 1.4.2: http://wiki.dspace.org/index.php/Release1_4_2 A few months back, we ran into some errors during our indexing as well, and I had to essentially either install or reinstall these libraries into /<dspace-source>/lib before a rebuild. This was early on in my tenure as repository manager and one of those things I probably should have documented in a better way. However, if you are using 1.4.2 the bouncycastle libraries (who thought of that name, anyway) should support encrypted PDFs. (Again, showing that encryption is easily avoided.) Shane Beers Digital Repository Services Librarian George Mason University [EMAIL PROTECTED] 703-993-3742 On Nov 29, 2007, at 9:41 AM, Mark H. Wood wrote: > On Wed, Nov 28, 2007 at 02:51:15PM -0500, Shane Beers wrote: >> Additionally, I believe that the mechanism DSpace employs to scan the >> full-text of a PDF (for indexing purposes) does not pay attention to >> these security restrictions in the first place, which is a funny >> sidenote that speaks to the ease of avoiding the security. > > I wish. I frequently receive email full of stackdumps from the filter > cronjob when it trips over a PDF that someone's forgotten to unlock. > It's a pain having to write those "if you want your document > indexed..." letters. > > -- > Mark H. Wood, Lead System Programmer [EMAIL PROTECTED] > Typically when a software vendor says that a product is "intuitive" he > means the exact opposite. > > ------------------------------------------------------------------------- > SF.Net email is sponsored by: The Future of Linux Business White Paper > from Novell. From the desktop to the data center, Linux is going > mainstream. Let it simplify your IT future. > http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4_______________________________________________ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech