Can’t we make PDFBox open the document with an empty password? What’s the story 
for 2.0?

— John

> On 8 May 2015, at 08:52, Tilman Hausherr <[email protected]> wrote:
> 
> Am 08.05.2015 um 17:51 schrieb Clemens Wyss DEV:
>> Thx for the very fast answer.
>>> new StandardDecryptionMaterial( password );
>> I have no password. The pdf is a public user manual.
> 
> Use an empty password :-)
> 
> Tilman
> 
>> 
>>> That is TIKA, isn't it?
>> True
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Tilman Hausherr [mailto:[email protected]]
>> Gesendet: Freitag, 8. Mai 2015 17:44
>> An: [email protected]
>> Betreff: Re: extracting text from an "encrypted" pdf
>> 
>> Am 08.05.2015 um 17:36 schrieb Clemens Wyss DEV:
>>> When I try to extract an "encrypted" (which can be read in AcrobatReader) 
>>> document with:
>>> 
>>> pdfDocument = PDDocument.load( is );
>> add
>> if( document.isEncrypted() )
>> {
>>   StandardDecryptionMaterial sdm = new StandardDecryptionMaterial( password 
>> ); document.openProtection( sdm ); }
>> 
>> or use loadNonSeq()
>> 
>>> PDFTextStripper pdfStripper = new PDFTextStripper(); parsedText =
>>> pdfStripper.getText( pdfDocument );
>>> 
>>> I get an empty string, and " o.apache.pdfbox.pdfparser.PDFParser - Document 
>>> is encrypted" is logged.
>>> 
>>> When, on the other hand, I do:
>>> 
>>> ContentHandler handler = new BodyContentHandler( -1 ); ParseContext
>>> context = new ParseContext(); parser = new AutoDetectParser();
>>> context.set( Parser.class, parser );
>>>   parser.parse( is, handler, metadata, context ); parsedText =
>>> handler.toString();
>>> 
>>> I get to see the text/content of the very pdf.
>>> 
>>> 1) What ist he preferred way to extract text from a 
>>> pdf("-that-can-be-read-in-AcrobatReader")?
>> https://svn.apache.org/viewvc/pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/ExtractText.java?view=markup&sortby=date
>> 
>>>   2) Does the second approach possibly return "more than text"? Blobs? 
>>> Binary data?
>> That is TIKA, isn't it?
>> 
>> Tilman
>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] 
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected] 
> <mailto:[email protected]>

Reply via email to