Can’t we make PDFBox open the document with an empty password? What’s the story for 2.0?
— John > On 8 May 2015, at 08:52, Tilman Hausherr <[email protected]> wrote: > > Am 08.05.2015 um 17:51 schrieb Clemens Wyss DEV: >> Thx for the very fast answer. >>> new StandardDecryptionMaterial( password ); >> I have no password. The pdf is a public user manual. > > Use an empty password :-) > > Tilman > >> >>> That is TIKA, isn't it? >> True >> >> >> -----Ursprüngliche Nachricht----- >> Von: Tilman Hausherr [mailto:[email protected]] >> Gesendet: Freitag, 8. Mai 2015 17:44 >> An: [email protected] >> Betreff: Re: extracting text from an "encrypted" pdf >> >> Am 08.05.2015 um 17:36 schrieb Clemens Wyss DEV: >>> When I try to extract an "encrypted" (which can be read in AcrobatReader) >>> document with: >>> >>> pdfDocument = PDDocument.load( is ); >> add >> if( document.isEncrypted() ) >> { >> StandardDecryptionMaterial sdm = new StandardDecryptionMaterial( password >> ); document.openProtection( sdm ); } >> >> or use loadNonSeq() >> >>> PDFTextStripper pdfStripper = new PDFTextStripper(); parsedText = >>> pdfStripper.getText( pdfDocument ); >>> >>> I get an empty string, and " o.apache.pdfbox.pdfparser.PDFParser - Document >>> is encrypted" is logged. >>> >>> When, on the other hand, I do: >>> >>> ContentHandler handler = new BodyContentHandler( -1 ); ParseContext >>> context = new ParseContext(); parser = new AutoDetectParser(); >>> context.set( Parser.class, parser ); >>> parser.parse( is, handler, metadata, context ); parsedText = >>> handler.toString(); >>> >>> I get to see the text/content of the very pdf. >>> >>> 1) What ist he preferred way to extract text from a >>> pdf("-that-can-be-read-in-AcrobatReader")? >> https://svn.apache.org/viewvc/pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/ExtractText.java?view=markup&sortby=date >> >>> 2) Does the second approach possibly return "more than text"? Blobs? >>> Binary data? >> That is TIKA, isn't it? >> >> Tilman >> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > <mailto:[email protected]> > For additional commands, e-mail: [email protected] > <mailto:[email protected]>
