Clemens Wyss created PDFBOX-1821:
------------------------------------

             Summary: Parsing (extracting content) a single 5Mb pdf file takes 
3minutes
                 Key: PDFBOX-1821
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1821
             Project: PDFBox
          Issue Type: Bug
         Environment: Win7 (8G memory)
Java 6
            Reporter: Clemens Wyss


When I try to extract the attached pdf-file with the following code:
...
PDFTextStripper stripper = new PDFTextStripper();
OutputStream os = null;
Writer writer = null;
PDDocument document = null;
File file = new File( "takes3mins.pdf" );
...
            document = PDDocument.load(file );
 
            File outFile = new File("c:/tmp/gugus.txt");
            os = new FileOutputStream(outFile);
            writer = new OutputStreamWriter(os);
 
            stripper.writeText(document, writer);
...
it takes approx 3minutes. Opening it in AcrobatReader in a few seconds.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to