Mark:

I know for certain that we have protected PDFs in our archive, and  
they get indexed with no issue. This is using 1.4.2. After watching  
the job take place, I noticed that DSpace employs a function called  
bouncycastle which supports the text extraction of encrypted PDFs.

This note can be seen on the wiki for the release notes for 1.4.2:

http://wiki.dspace.org/index.php/Release1_4_2

A few months back, we ran into some errors during our indexing as  
well, and I had to essentially either install or reinstall these  
libraries into /<dspace-source>/lib before a rebuild. This was early  
on in my tenure as repository manager and one of those things I  
probably should have documented in a better way. However, if you are  
using 1.4.2 the bouncycastle libraries (who thought of that name,  
anyway) should support encrypted PDFs. (Again, showing that encryption  
is easily avoided.)

Shane Beers
Digital Repository Services Librarian
George Mason University
[EMAIL PROTECTED]
703-993-3742



On Nov 29, 2007, at 9:41 AM, Mark H. Wood wrote:

> On Wed, Nov 28, 2007 at 02:51:15PM -0500, Shane Beers wrote:
>> Additionally, I believe that the mechanism DSpace employs to scan the
>> full-text of a PDF (for indexing purposes) does not pay attention to
>> these security restrictions in the first place, which is a funny
>> sidenote that speaks to the ease of avoiding the security.
>
> I wish.  I frequently receive email full of stackdumps from the filter
> cronjob when it trips over a PDF that someone's forgotten to unlock.
> It's a pain having to write those "if you want your document
> indexed..." letters.
>
> -- 
> Mark H. Wood, Lead System Programmer   [EMAIL PROTECTED]
> Typically when a software vendor says that a product is "intuitive" he
> means the exact opposite.
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell.  From the desktop to the data center, Linux is going
> mainstream.  Let it simplify your IT future.
> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4_______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to