Am 08.05.2015 um 23:47 schrieb John Hewson:
Can’t we make PDFBox open the document with an empty password? What’s the story 
for 2.0?

In 2.0 it opens immediately. Same in 1.8 when using the loadNonSeq().

Tilman


— John

On 8 May 2015, at 08:52, Tilman Hausherr <[email protected]> wrote:

Am 08.05.2015 um 17:51 schrieb Clemens Wyss DEV:
Thx for the very fast answer.
new StandardDecryptionMaterial( password );
I have no password. The pdf is a public user manual.
Use an empty password :-)

Tilman

That is TIKA, isn't it?
True


-----Ursprüngliche Nachricht-----
Von: Tilman Hausherr [mailto:[email protected]]
Gesendet: Freitag, 8. Mai 2015 17:44
An: [email protected]
Betreff: Re: extracting text from an "encrypted" pdf

Am 08.05.2015 um 17:36 schrieb Clemens Wyss DEV:
When I try to extract an "encrypted" (which can be read in AcrobatReader) 
document with:

pdfDocument = PDDocument.load( is );
add
if( document.isEncrypted() )
{
   StandardDecryptionMaterial sdm = new StandardDecryptionMaterial( password ); 
document.openProtection( sdm ); }

or use loadNonSeq()

PDFTextStripper pdfStripper = new PDFTextStripper(); parsedText =
pdfStripper.getText( pdfDocument );

I get an empty string, and " o.apache.pdfbox.pdfparser.PDFParser - Document is 
encrypted" is logged.

When, on the other hand, I do:

ContentHandler handler = new BodyContentHandler( -1 ); ParseContext
context = new ParseContext(); parser = new AutoDetectParser();
context.set( Parser.class, parser );
   parser.parse( is, handler, metadata, context ); parsedText =
handler.toString();

I get to see the text/content of the very pdf.

1) What ist he preferred way to extract text from a 
pdf("-that-can-be-read-in-AcrobatReader")?
https://svn.apache.org/viewvc/pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/ExtractText.java?view=markup&sortby=date

   2) Does the second approach possibly return "more than text"? Blobs? Binary 
data?
That is TIKA, isn't it?

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] 
<mailto:[email protected]>
For additional commands, e-mail: [email protected] 
<mailto:[email protected]>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to