On Fri, Jan 20, 2012 at 8:01 AM, Farrell, Larry D <[email protected]> wrote: > At this point I was primarily targeting PDF and Microsoft Office files that > would be passed on to our cataloging folks for manual inspection if they were > DRM protected. As has been pointed out on the list, general DRM detection > has far trickier than I'd initially thought. I've been using Apache Tika for > file type detection, metadata and full text extraction. However, when > parsing encrypted or password protected files it throws the less than > unhelpful "Unexpected Runtime Exception".
If you're looking for a marker of "PDFs that need manual inspection," then "causes Tika to throw a runtime exception" might be a pretty good choice. -n
