Webrtc Go created PDFBOX-2083:
---------------------------------

             Summary: Some characters overlap other characters, font changed
                 Key: PDFBOX-2083
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2083
             Project: PDFBox
          Issue Type: Bug
         Environment: windows8 
            Reporter: Webrtc Go


Hi, please forgive my english first.
I tried to convert a pdf file to images, using pdfbox 1.8.4 within 
tika-app-1.5.jar.
The jpeg files I got were not ideal.
The content in the images were different from the pdf file.
Some characters were in different places, and some characters overlapped others.
There were many lines of console information which read:
'13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New 
Italic> to the default font 
13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New 
Italic> to the default font 
13:49:07,095 WARN [PDSimpleFont:107] Changing font on <y> from <Courier New 
Italic> to the default font 
13:49:07,095 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New 
Italic> to the default font 
...'
Could you give me some instruction, tell me how to solve this problem, how to 
get ideal images?
Thanks a lot.

I attached the pdf file and one of the images.
And here are my code:
PDDocument doc = PDDocument.load(input + ".pdf");
List<PDPage> pages = doc.getDocumentCatalog().getAllPages(); 
for (int i = 0; i < pages.size(); i++) { 
    PDPage page = pages.get(i); 
    BufferedImage image = page.convertToImage(); 
    Iterator<ImageWriter> iter = ImageIO.getImageWritersBySuffix("JPG"); 
    ImageWriter writer = iter.next(); 
    File outFile = new File(input + i + ".jpg");
    FileOutputStream out = new FileOutputStream(outFile); 
    ImageOutputStream outImage = ImageIO.createImageOutputStream(out); 
    writer.setOutput(outImage); 
    writer.write(new IIOImage(image, null, null)); 
    writer.dispose(); 
    out.close(); 
} 
doc.close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to