[
https://issues.apache.org/jira/browse/PDFBOX-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001478#comment-14001478
]
Webrtc Go edited comment on PDFBOX-2083 at 5/19/14 8:09 AM:
------------------------------------------------------------
the jpeg file is from page No.11 of the pdf file
was (Author: webrtcgo):
the jpeg file is form page No.11 of the pdf file
> Some characters overlap other characters, font changed
> ------------------------------------------------------
>
> Key: PDFBOX-2083
> URL: https://issues.apache.org/jira/browse/PDFBOX-2083
> Project: PDFBox
> Issue Type: Bug
> Environment: windows8
> Reporter: Webrtc Go
> Attachments: technical-guide.pdf, vgsdmuuhd5ak03orqudq10.jpg
>
>
> Hi, please forgive my english first.
> I tried to convert a pdf file to images, using pdfbox 1.8.4 within
> tika-app-1.5.jar.
> The jpeg files I got were not ideal.
> The content in the images were different from the pdf file.
> Some characters were in different places, and some characters overlapped
> others.
> There were many lines of console information which read:
> '13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New
> Italic> to the default font
> 13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New
> Italic> to the default font
> 13:49:07,095 WARN [PDSimpleFont:107] Changing font on <y> from <Courier New
> Italic> to the default font
> 13:49:07,095 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New
> Italic> to the default font
> ...'
> Could you give me some instruction, tell me how to solve this problem, how to
> get ideal images?
> Thanks a lot.
> I attached the pdf file and one of the images.
> And here are my code:
> PDDocument doc = PDDocument.load(input + ".pdf");
> List<PDPage> pages = doc.getDocumentCatalog().getAllPages();
> for (int i = 0; i < pages.size(); i++) {
> PDPage page = pages.get(i);
> BufferedImage image = page.convertToImage();
> Iterator<ImageWriter> iter = ImageIO.getImageWritersBySuffix("JPG");
> ImageWriter writer = iter.next();
> File outFile = new File(input + i + ".jpg");
> FileOutputStream out = new FileOutputStream(outFile);
> ImageOutputStream outImage = ImageIO.createImageOutputStream(out);
> writer.setOutput(outImage);
> writer.write(new IIOImage(image, null, null));
> writer.dispose();
> out.close();
> }
> doc.close();
--
This message was sent by Atlassian JIRA
(v6.2#6252)