On Fri, Jan 20, 2012 at 8:01 AM, Farrell, Larry D
<[email protected]> wrote:
> At this point I was primarily targeting PDF and Microsoft Office files that 
> would be passed on to our cataloging folks for manual inspection if they were 
> DRM protected.  As has been pointed out on the list, general DRM detection 
> has far trickier than I'd initially thought.  I've been using Apache Tika for 
> file type detection, metadata and full text extraction.  However, when 
> parsing encrypted or password protected files it throws the less than 
> unhelpful "Unexpected Runtime Exception".

If you're looking for a marker of "PDFs that need manual inspection,"
then "causes Tika to throw a runtime exception" might be a pretty good
choice.

-n

Reply via email to